timing consistently produces faster circuits for a given confidence level, as .... produce a circuit in such a way that, with at least 90% confidence, the final ...
OPTIMIZING CIRCUITS WITH CONFIDENCE PROBABILITY USING PROBABILISTIC RETIMING S. Tongsima, C. Chantrapornchai, E. H.-M. Sha
Nelson L. Passos
Dept. of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA
Dept. of Computer Science Midwestern State University Wichita Falls, TX 76308, USA
ABSTRACT VLSI circuit manufacturing results in theoretically identical components that actually have varying propagation delays. A “worstcase” or even “average-case” estimation of such delays during the design procedure may be overly pessimistic and will lead to costly and unnecessary redesign cycles. This paper presents a new optimization methodology, called probabilistic retiming, which transforms a circuit based on statistical timing data gathered either from component production histories or from a simulation of the fabrication process. Such circuits are modeled as graphs where each vertex represents a combinational element that has a probabilistic timing characteristic. A polynomial-time algorithm, applicable to such a graph, is developed which retimes a circuit in order to produce a design operating in a specified cycle time within a given confidence level. Experiments show that probabilistic retiming consistently produces faster circuits for a given confidence level, as compared with the traditional retiming algorithm. 1. INTRODUCTION In VLSI design, engineers are normally facing the problem of designing a circuit able to achieve a given time constraint. Nevertheless, the information about the propagation delay of each circuit component is uncertain due to the VLSI fabrication process fluctuations or component input variation [2, 11]. Those constraint variations prevent the designers from accurately estimating the timing value. Under current design environment, the “worst-case” or “average-case” assumption are normally used to approximate the timing of the logical circuit elements. However, the use of such values of the propagation delay in the synthesis process, to guide the design, may not always be correct and, therefore, may result in unnecessary and costly redesign cycles. With this motivation, a good synthesis tool should consider the different possibilities of the final system during the initial design phase, providing the engineer with an architecture able to achieve an expected performance within a given, qualitatively provable, confidence level. Such a tool would provide a more realistic initial design that could dramatically reduce the cost and time spent in development process. This paper presents a polynomial-time circuit transformation algorithm, called probabilistic retiming, which considers the probabilistic nature of the timing characteristic of basic circuit elements. It produces a solution that satisfies the timing constraints within a given confidence level. Furthermore, by considering the uncertain component timing characteristics at the beginning of the design cycle, time consuming redesign iterations can be minimized. As traditional retiming has had a great impact on the optimization PARTIALLY SUPPORTED BY THE NSF (MIP 95-01006) AND THE ROYAL THAI GOVERNMENT SCHOLARSHIP.
problem of various applications, this new technique will have a similar impact on various applications with uncertain timing characteristics. Many researchers implicitly adopted the worst case timing information as one of their design assumptions. In particular, Ishii modified the retiming technique in such a way that it can also handle precharged structures and gated clock signal circuitry [4]. Dey, Potkonjak and Rothweiler proposed a method to enhance retiming by transforming sequential circuits [1]. A large number of applications of retiming were explored, always making the use of the same assumption, such as the work of Shenoy and Brayton [13], Lockyear and Ebeling [10], and Liu et al. [9]. Lower bounds were computed, such as the work by Papaefthymiou, based on the minimum clock period presented in terms of the delay-to-register ratio, which can be found by retiming a circuit without considering its probabilistic behavior [12]. More comprehensive delay models were proposed by Soyata, Friedman and Mulligan [14, 15]. These models do not, however, consider the variance of circuit delays due to the manufacturing process. Recently, Karkowski and Otten introduced a model to handle the imprecise propagation delay of events [5]. In their approach, fuzzy set theory [16] was employed to model imprecise delays, but with only three possible values. Multiple objective linear programming [7] was chosen to reduce the uncertainty of such delays. Nonetheless, this model is restricted to a simple triangular fuzzy distribution and does not consider probability values which are usually easy to obtain from the quality control of the fabrication process. In order to generalize the idea of having imprecise delays, in this paper, propagation delays are assumed to be random variables that are associated with probability distributions. The retiming technique is then extended to optimize circuits under probabilistic environments. In order to formalize this methodology, a circuit is represented as a graph where the vertex set V is a collection of combinational logic components associated with random variables representing propagation delays. Edges in the graph represent the interconnections between components which may route through zero or more registers. The probability distribution of the propagation delay being the value x is denoted by (X = x), where X represents the propagation delay for a component and x is a particular value values [6]. A simple example is presented to show the usefulness of probabilistic retiming. Figure 1(b) illustrates a graph G in which the set of vertices (or nodes) is A; B; C; D; and E . Assume the nodes have the timing characteristics as shown in Figure 1(a). Figure 1(c) shows a graph which considers the execution time of each vertex
P
f
g
based on the worst case analysis of the propagation delays. The maximum propagation delay of all paths that contains no registers for graph G, called the clock period or cycle period, and denoted by (G), is 9. Based on worst case assumptions, traditional retiming gives the “best” circuit shown in Figure 1(d). But due to the variances in execution time, the best retiming obtained using worst-case assumptions may produce a non-optimal circuit, if not all circuits produced need to meet the cycle period constraint. One might wish to retime G in order to obtain (G) = 3. However, such a desired clock period cannot be achieved from the worst case analysis. However, since the propagation delay associated with each component is a random variable, a designer might wish to produce a circuit in such a way that, with at least 90% confidence, the final produced systems could operate at a clock period less than or equal to 3.
A :P (TA = x) B :P (TB = x) C :P (TC = x) D:P (TD = x) E :P (TE = x)
0.1 if x = 1 0.3 if x = 1 0.4 if x = 1 0.3 if x = 1 0.9 if x = 1
0.9 if x = 3 0.7 if x = 2 0.6 if x = 2 0.7 if x = 2 0.1 if x = 4
(a) Timing characteristic
the propagation delay, given by Tv , where Tv is a discrete random variable associated with the set of possible propagation delays of e v the vertex v such that 8x (Tv = x) = 1. A shorthand u represents edge e E from u to v , u; v V , and a path p starting p v: Also, from u and ending at v is denoted by the notation u e e the number of registers of a path p (d(p)) where p = v0 0 v1 1 ek?1 k ? 1 vk is computed by d(p) = i=0 d(ei ). Retiming operations rearrange registers in a circuit so that the behavior of the circuit is preserved while achieving a faster circuit [8]. The optimization goal is to reduce the clock period (G), representing the execution time of the longest path, in terms of propagation delay, (referred to as the critical path) that has all non-zero register edges, i.e., defined by the equations (G) = ek?1 vk , max t(p) : d(p) = 0 where p = v0 e0 v1 e1 k k ? 1 t(p) = i=0 t(vi ), and d(p) = i=0 d(ei ). The retiming of a graph G = V; E; d; t is a transformation function from verZ. The retiming function tices to the set of integers, r : V describes the movement of registers with respect to the vertices so as to transform G into a new graph Gr = V; E; dr ; t where dr represents the number of registers on the edges of Gr . The positive (or negative) value of the retiming function determines the movement of the registers. During retiming the same number of registers are pushed from all incoming (outgoing) edges of a node to all outgoing (incoming) edges. For instance, r(u) = 1 implies that one register is pushed from all incoming edges of node u V , to all outgoing edges of node u. If a register is pushed from all outgoing to all incoming edges of u, then r(u) = 1. The following summarizes some essential properties of the retiming transformation.
2
P
P
P
!
f
!
2
P
P
g
h
i
! !
!
! !
h
i
7!
2
A
D
3 A
D2
C B
C
E
(b) orig
2 B
3 A
A
D2
D
2
2
C E 4
2 B
C B
E 4
(c) wrst
(d) trad
E
(e) prob
Figure 1: An example of 5-node graph Let Y be the random variable representing the maximum of the cumulative propagation delays along any path that has no registers on it. Then, in Figure 1(e), (Y = 2) = 0:00324; (Y = 3) = 0:89676; and (Y > 4) = 0:10. Thus, the retimed graph in Figure 1(e) has satisfied the designer requirements that the probability of the final circuit’s clock period being less than or equal to 3 is equal to 90%. On the other hand, after substituting back those probability distributions to the vertices of the graph in Figure 1(d), the probability of the clock period being less than or equal to 3 for such a graph is only 9%.
P
P
P
2. PRELIMINARIES A circuit that contains functional elements with associated propagation delay probability distributions can be modeled as a probabilistic graph (PG). A probabilistic graph (PG) is a vertex-weighted, edge-weighted, directed graph G = V; E; d; T , where V is the set of vertices representing circuit elements, E is the set of edges representing the data dependencies between vertices, d is a function from E to Z+, the set of positive integers, representing the number of registers on an edge, and Tv is a random variable repV . Each vertex resenting the propagation delay of a node v v V is weighted with a probability distribution function (pdf) of
h
2
i
2
?
Property 2.1 1. r is a legal retiming if dr (e)
0; 8e 2 E .
e 2. 8 u ! v; where u; v 2 V; dr (e) = d(e) + r(u) ? r(v ). p 3. 8 u v; where u; v 2 V; dr (p) = d(p) + r(u) ? r(v). 4. In any directed cycle (l) of G and Gr , dr (l) = d(l) > 0. 3. PROBABILISTIC RETIMING
Recall the definition of a probabilistic graph (PG). Since the propagation delay of each vertex of a PG is a random variable, the traditional notion of a fixed global clock period, (G), for PG G is no longer valid. Therefore, the random variable mrt(G), called the maximum reaching time of graph G, which represents the probabilistic clock period for graph G is introduced. Similarly, mrt(u; v ) represents the probabilistic clock period for the portion of the graph between nodes u and v . The requirements for a probabilistic graph are usually described by an acceptable propagation delay for the final circuit, denoted by c; and a confidence level = 1 , where is the acceptable probability of not achieving the required performance. In this paper, the requirement is expressed as (mrt(G) c) , or (mrt(G) > c) > . The goal of probabilistic retiming is to transform a PG such that the requirement can be satisfied. Since it is complex to efficiently compute a function of dependent random variables, Algorithm 3.1 which computes the mrt(G) assumes the random variables are independent. Fortunately, by using this approach, the algorithm can give an upper bound of the exact solution of the mrt when considering the operation of dependent random variables. In the algorithm, two dummy vertices with
?
P
P
zero propagation delays, vs and vd , are added to the graph. A set of zero-register edges is used to connect vertex vs to all root-nodes, and to connect all leaf-nodes to vertex vd . Thus, the mrt(vs ; vd ) gives the overall maximum reaching time of the graph mrt(G). In order to compute the mrt, only the graph portion that has zero register edges, i.e., a directed acyclic graph (DAG), is considered.
4. EXPERIMENTAL RESULTS
?
Algorithm 3.1 (Maximum reaching time)
h
i
Input : PG G = V; E; d; T Output: mrt(G) = tmpmrt (vs ; vd )
0 = hV0 ; E0 ; d; T i such that V0 = V e+ fvs ; vd g; e 0 = ? f 2 j ( ) 6= 0g + fvs ! v 2 Vr ; u 2 Vl ! vd g 8 2 0 tmp ( ) = 0; Tvs = Tvd = 0; Queue = vs
1 G
2 E E e Ed e 3 u V ; mrt vs ; u 4 while Queue do 5 get u; Queue 6 mrt vs ; u
6 ; = ( ) tmp ( ) = tmpmrt (vs ; u) + Tu !e
7 foreach u v do 8 indegree v indegree v 9 mrt vs ; v mrt vs ; u ; 10 if indegree v then put v; Queue 11 end do 12 end do
( )= ( )?1 tmp ( ) = max(tmp ( ( )=0 (
) tmpmrt (vs ; v)) )
Algorithm 3.1 traverses the DAG portion (G0 ) of the graph in topological order and compute the mrt of each node v with respect to vs . At Line 9, the mrt of all parents of a node are maximized. Line 6 will then add Tv to the tmpmrt of node v producing the final mrt with respect to all paths reaching node v . Finally, after node vd is visited, the mrt(G) contains the final mrt of the graph. Algorithm 3.2 presents the probabilistic retiming algorithm. The algorithm retimes vertices whose probability of propagation delays being greater than c is larger than the acceptable probability value. Lines 7–18 traverse the DAG in a breath-first search manner and update the tmpmrt for each node as in Algorithm 3.1. After updating a vertex, the resulting tmpmrt is tested to see if the , is met. Line 15, then derequirement, (tmpmrt (G) > c) creases the retiming value of any vertex v that violates the requirement unless the vertex has previously been retimed in a current iteration. The algorithm then repeats the above process using the retimed graph obtained from the previous iteration.
P
Algorithm 3.2 (Probabilistic retiming)
h
i
Input: PG G = V; E; d; T ; c, . Output: Retiming function r.
8
2 ( )=0 =1 j j = ( ); = false i 0=h 0 0 0 = V +e fvs ; vd g e2 1 E0 = E ? fe 2 E jd(e) 6= 0g + fvs ! v 2 Vr ; u 2 Vl ! vd g 8u 2 V0 ; tmpmrt (vs ; u) = 0; Queue = vs ; Tvs = Tvd = 0 while Queue 6= ; do get (u; Queue ) tmpmrt (vs ; u) = tmpmrt (vs ; u) + Tu
vertex v V; set r v 1 2 for i to V do 3 Gr Retime G; r mod flag 4 G V ; E ; d; T where V 5 6 7 8 9
!e
10 foreach u v do 11 indegree v indegree v 12 mrt vs ; v mrt vs ; u ; mrt vs ; v 13 if mrt vs ; v > c > 14 then if u has not been retimed at current iteration r u mod flag true 15 then r u 16 if indegree v then put v; Queue 17 end do 18 end do 19 if mod flag false then Report r; break 20 end do
( )= ( )?1 tmp ( ) = max(tmp ( P (tmp ( ) ) ( ) = ( ) ? 1; ( )=0 ( =
) tmp ( )
=
;
))
In each experiment, for a given confidence level = 1 , Algorithm 3.2 is repeatedly applied to search for the best clock period. Table 1 shows the results for traditional retiming using worstcase propagation delay assumptions (column worst) and the probabilistic model with varying confidence levels. The first column in Table 1 lists the tested benchmarks. Column worst in the table presents the optimized clock period obtained from applying traditional retiming using the worst-case propagation delay of each adder (24ns) and each multiplier (30ns). In these experiments, the distributions of the propagation delays for the basic components (adder and multiplier) in these circuits were obtained from [3], and were assumed to fit normal distributions [2]. Columns 4 through 8 show the best clock period c for the confidence levels 0:9 down to 0:5. Notice that for all benchmarks the clock periods with = 0:9 are still smaller than the propagation delays in Column 3. The “%” columns list the feasible clock period reduction percentages with respect to the worst-case method. With = 0:9 all benchmarks have more than an 18% improvement, while with = 0:5 all benchmarks have more than a 26% reduction. Table 2 compares the probabilistic retiming algorithm to the traditional retiming algorithm with average-values used for individual components. In each experiment, the expected value is 16ns for an adder and 20ns for a multiplier [3]. Traditional retiming is applied to these graph, assuming the expected values (Gavg ). The clock period of Gavg (Gavg ) is listed in column 3 of Table 2. E (mrt(Gavg )) in column 4 is an expected value of the maximum reaching time of Gavg , if the pdfs for each component are taken into account. Since the probabilistic nature of each individual component must be considered to calculate the actual propagation delay for the entire graph (column 4), using the average timing value for each component may produce misleading results. This scenario illustrates that using traditional retiming with average-case values produces poor results that are significantly larger than (Gavg ), when the probabilistic nature of each component is included. In order to make a comparison to probabilistic retiming, results for the benchmarks are also computed using Algorithm 3.2 with confidence levels of 0:9 to 0:5. The expected (average) clock periods for these situations are shown in columns 5 to 9. These expected clock periods are consistently smaller than E (mrt(Gavg )). In addition, these values are normally much closer to (Gavg ) than to E (mrt(Gavg )), and are actually less than (Gavg ) in some cases. Hence, the approach of using the expected values for each component is neither a good heuristic in the initial design phase nor does it give any quantitative confidence to the resulting circuits. 5. CONCLUSION VLSI circuit manufacturing results in devices with varying propagation delays. The estimation of such delays during the design procedure may not be totally accurate, leading to extra design cycles. We present a new transformation technique, called probabilistic retiming, which considers varying timing delays, and takes ((cn)2 V 2 E ) time. From experiments, considering realistic cases, this algorithm can effectively optimize circuits while utilizing manufacturing probability information and the designer requirements of a desired clock period c and a confidence level . This methodology is expected to have an impact in many areas,
O
j jj j
num. nodes 8 11 12 15 17 27 34 45 105 170
Benchmark Biquad IIR Diff. Equation 3-stage direct IIR All-pole Lattice th order WDF Volterra th Elliptic All-pole Lattice (uf=2) All-pole Lattice (uf=6) th Elliptic (uf=4)
4 5 5
c worst 78 118 54 157 156 276 330 468 1092 1633
= 0:9
c 60 81 44 120 116 216 240 350 811 1185
% 23 31 19 24 26 22 28 25 26 27
P (mrt(vs ; vd ) c) 1 ? = = 0:8
c 57 77 41 117 112 212 236 346 806 1174
% 26 35 24 25 28 23 29 25 26 28
= 0:7
c 56 76 40 115 109 208 233 343 802 1169
% 28 36 26 27 30 25 30 27 27 29
= 0:6
c 52 73 39 113 108 205 230 340 799 1164
% 33 38 28 28 31 26 31 27 27 29
= 0:5
c 51 72 36 112 106 202 228 338 796 1160
% 35 39 33 29 32 27 31 28 27 29
Table 1: Probabilistic retiming versus worst case traditional retiming
Benchmark Biquad IIR Diff. Equation 3-stage direct IIR All-pole Lattice th order WDF Volterra th Elliptic All-pole Lattice (uf=2) All-pole Lattice (uf=6) th Elliptic (uf=4)
4 5 5
num. 8 11 12 15 17 27 34 45 105 170
(Gavg ) 52 72 36 104 104 200 220 312 728 1100
E (mrt(Gavg )) 70.40 76.05 41.90 114.45 106.73 204.00 233.30 342.17 800.51 1204.50
= 0:9
52.64 73.07 37.70 111.77 106.44 202.44 228.41 338.11 794.02 1160.80
Expected clock period (Algorithm 3.2) : : : 52.30 53.18 51.02 72.50 72.32 71.29 38.36 38.97 38.40 111.40 111.00 110.41 105.98 105.28 104.99 202.00 201.22 200.31 227.59 226.95 226.01 337.62 337.00 336.08 793.39 792.53 791.57 1159.30 1158.17 1156.58
=08
=07
=06
= 0:5
51.64 70.94 36.27 110.04 104.17 198.06 225.19 335.27 790.31 1154.90
Table 2: Probabilistic retiming versus average case analysis
e.g., high-level synthesis, loop scheduling, testing, etc., similar to traditional retiming. 6. REFERENCES [1] S. Dey, M. Potkonjak, and S. G. Rothweiler. Performance optimization of sequential circuits by eliminating retiming bottlenecks. In Proceedings of the 1992 IEEE/ACM International Conference on Computer Aided Design, pages 504– 509, 1992. [2] S. Director, W. Maly, and A. Strojwas. VLSI design for manufacturing: yield enhancement. Kluwer Academic Publishers, 1990. [3] Texas Instruments. The TTL data book, volume 2. Texas Instruments Incorporation, 1985. [4] A. T. Ishii. Retiming gated-clocks and precharged circuit structures. In Proceedings of the 1993 IEEE/ACM International Conference on Computer Aided Design, pages 300– 307, 1993. [5] I. Karkowski and R. H. J. M. Otten. Retiming synchronous circuitry with imprecise delays. In Proceedings of the 32nd Design Automation Conference, pages 322–326, San Francisco, CA, 1995. [6] E. Kreyszig. Advanced Engineering Mathematics, chapter 23. John Wiley and Sons, New York, 6th edition, 1988. [7] Y.-J. Lai and C.-L. Hwang. A new approach to some possibilistic linear programming problems. Fuzzy Sets and Systems, 49:121–133, 1992. [8] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5–35, 1991.
[9] L.-T. Liu et al. Performance-driven partitioning using retiming and replication. In Proceedings of the 1993 IEEE/ACM International Conference on Computer Aided Design, pages 296–299, 1993. [10] B. Lockyear and C. Ebeling. The practical application of retiming to the design of high-performance systems. In Proceedings of the 1993 IEEE/ACM International Conference on Computer Aided Design, pages 288–295, 1993. [11] P. Mozumder and A. Strojwas. Statistical control for VLSI fabrication processes. In W. Moore, W. Maly, and A. Strojwas, editors, Yield Modeling and Defect Tolerance in VLSI. Adam Hilger, Bristol and Philadelphia, 1987. [12] M. C. Papaefthymiou. Understanding retiming through maximum average-delay cycles. Mathematical Systems Theory, 27:65–84, 1994. [13] N. Shenoy and R. K. Brayton. Retiming of circuits with single phase transparent latches. In Proceedings of the 1991 International Conference on Computer Design, pages 86–89, 1991. [14] T. Soyata and E. Friedman. Retiming with non-zero clock skew, variable register and interconnect delay. In Proceedings of the 1994 IEEE/ACM International Conference on Computer Aided Design, November 1994. [15] T. Soyata, E. Friedman, and J. Mulligan. Integration of clock skew and register delays into a retiming algorithm. In Proceedings of the International Symposium on Circuits and Systems, pages 1483–1486, May 1993. [16] L. A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1:3–28, 1978.