5B-2
Crosstalk-Aware Power Optimization with Multi-Bit Flip-Flops Chih-Cheng Hsu, Yao-Tsung Chang, and Mark Po-Hung Lin Department of Electrical Engineering National Chung Cheng University Chiayi 621, Taiwan {twf1400, protium7322}@gmail.com,
[email protected]
Abstract—Applying multi-bit flip-flops (MBFFs) for clock power reduction in modern nanometer ICs has been becoming a promising lower-power design technique. Many previous works tried to utilize as more MBFFs with larger bit numbers as possible to gain more clock power saving. However, an MBFF with a larger bit number may lead to serious crosstalk due to the close interconnecting wires belonging to different signal nets which are connected to the same MBFF. To address the problem, this paper analyzes, evaluates, and compares the relationship between power consumption and crosstalk when applying MBFFs with different bit numbers. To solve the addressed problem, a novel crosstalk-aware power optimization approach is further proposed to optimize power consumption while satisfying the crosstalk constraint. Experimental results show that the proposed approach is very effective in crosstalk avoidance when applying MBFFs for power optimization. To our best knowledge, this is also the first work in the literature that considers the crosstalk effect for the MBFF application.
I. I NTRODUCTION Power/thermal minimization has been becoming one of the most important objectives in the design of modern system on chips (SOCs) which integrate huge numbers of transistors. High power/thermal dissipation of an SOC may degrade product lifetime and reliability. The power consumption of an SOC arises from dynamic, leakage, and short-circuit power, where the dynamic power is the major power source among the three [14]. The power consumption of the clock network further dominates the dynamic power [15] because of the highest switching rate of the clock signal. To minimize the power consumption of the clock network, many techniques had been proposed, such as buffer sizing [24], [23], clock gating [6], [7], register clustering [5], [18] and banking [9], and replacing 1-bit flip-flops with multi-bit flipflops (MBFFs) [2], [3], [4], [10], [13], [19], [22], [25]. Recent studies have shown the effectiveness of applying MBFFs in saving both power and area [2], [3], [4], [10], [25]. An MBFF consists of two or more 1-bit flip-flops, which share a common clock driver. According to [4], as the process technology advances to 65nm and beyond, the minimum-sized clock driver can still drive several flip-flops. Consequently, replacing 1-bit flip-flips with MBFFs can reduce the power consumption of the clock network and the chip area due to the elimination of redundant clock drivers.
978-1-4673-0772-7/12/$31.00 ©2012 IEEE
A. Previous Work The idea of applying MBFFs was first proposed in [19] to control clock skew and delay during physical synthesis. Kretchmer [13] and Chen et al. [4] introduced a design methodology to infer MBFF cells using existing logic synthesis tools. Most recently, [2], [3], [10], [22], [25] proposed various placement-based optimization techniques, which replace 1-bit flip-flops with MBFFs during placement, to minimize power consumption while satisfying both timing and placement density constraints. In addition to the clock power, timing, and placement density, the interconnecting wirelength, routability, and switching power of signal nets are further considered in [2], [3] and [22], respectively. All these works tried to utilize as more MBFFs with larger bit numbers as possible to gain more clock power saving. None of them considers substantial crosstalk noise among the signal nets which connects to the same MBFF, especially when the bit number of the MBFF is large. B. Our Contributions In this paper, we first evaluate, analyze, and compare the relationship between power consumption and crosstalk when applying MBFFs with different bit numbers. We then address the ineffectiveness of applying MBFFs with larger bit numbers due to the serious crosstalk. To solve the addressed problem, a novel crosstalk-aware power optimization flow and the corresponding algorithms are proposed to minimize power consumption while satisfying the crosstalk, timing, and placement density constraints. Instead of clustering a maximum number of 1-bit flip-flops by searching the maximum clique in the flip-flop intersection graph [2], [10], [22], we present a crosstalk-aware bottom-up clustering algorithm for better crosstalk consideration. A coupling capacitance map is also introduced for better crosstalk estimation throughout the algorithms. Experimental results show that the proposed approach is very effective in crosstalk avoidance when applying MBFFs for power optimization compared with the previous works. To our best knowledge, this is also the first work in the literature that considers the crosstalk effect for the MBFF application. The remainder of this paper is organized as follows. Section II demonstrates the crosstalk effect due to MBFFs. Section III proposes a novel flow and algorithms to solve
431
5B-2 the problem. Section IV reports the experimental results, and finally Section V concludes this paper. II. C ROSSTALK E FFECT DUE TO MBFF S In this section, a proper crosstalk model is first introduced to measure the crosstalk on a net. Based on the crosstalk model, a preliminary experiment is then performed to evaluate the crosstalk effect of applying MBFFs with different bit numbers.
The input of each flow includes a pre-placed design, which are available from [2], as shown in Tables I.In the first flow, we did not perform power optimization with MBFFs, while in the second, third, and forth flows, we implemented the algorithms proposed in [2] to perform power optimization with 2-bit flipflops, 2/4-bit flip-flops, and 2/4/8-bit flip-flops, respectively.
TABLE I T HE BENCHMARK CIRCUITS IN [2].
A. Crosstalk Model To calculate the crosstalk on a victim net, ni , from any other aggressor net, nj , we adopt the crosstalk model which was generally applied to global routing [26] and technology mapping [8]. The total crosstalk, Xni , on ni in worst case is the summation of all crosstalk effects from other nets, which can be obtained from Equation (1), where eij is the crosstalk coefficient, and Cij is the coupling capacitance between ni and nj . Xni =
eij Cij .
(1)
j=i
The crosstalk coefficient, eij ∈ [0, 1], is a real number indicating the crosstalk on ni contributed by one unit of coupling capacitance from nj . The coupling capacitance Cij between two parallel wires can be further calculated by Equation (2), where l is the parallel run length of the wires, d is the distance between the wires, α is a constant indicating the unit capacitance, and β was estimated to be 2 according to [21]. Cij = α
l . dβ
(2)
B. Power and Crosstalk Evaluation of MBFFs Based on the crosstalk model, we present the crosstalk evaluation flows, as seen in Figure 1, to evaluate the crosstalk effect due to MBFFs with different bit numbers.
Cir. c1 c2 c3 c4 c5 c6
# of 1-bit FF 76 366 1464 4378 9150 146400
# of 2-bit FF 22 57 228 751 1425 22800
After obtaining the optimized design with MBFFs in each flow, we further applied the coupling-free routing, which was implemented based on the pattern routing technique [11], [12], to each optimized design. Finally, the power consumption of each design was calculated according to the MBFF library [2], where the normalized power consumption of 1-bit, 2-bit, 4-bit, and 8-bit flip-flop is 1.00, 0.86, 0.78, and 0.75, respectively. The crosstalk of each net was also analyzed based on the aforementioned crosstalk model. The analyzed results of all flows are shown in Table II. Table II lists the names of the benchmark circuits (“Cir.”), normalized flip-flop power consumption (“Normalized FF Power”), and normalized average crosstalk on each net (“Normalized Avg. Xtalk”) of the circuits for the four flows in Figure 1. For each circuit, the flip-flop power consumption and the average crosstalk on each net are normalized with respect to the first flow. We compare the normalized flip-flop power consumption and average crosstalk based on different flows by averaging those of different circuits in each flow as seen in the last row of Table II.
Pre-placed Design & MBFF library (1) Without Power Opt.
(2) Power Opt. (3) Power Opt. w/ 2-bit FFs w/ 2,4-bit FFs
(4) Power Opt. w/ 2,4,8-bit FFs
Optimized Design with MBFFs Coupling-Free Routing Power & Crosstalk Analysis Analyzed Reports of Power & Crosstalk Fig. 1. The crosstalk evaluation flows for power optimization using MBFFs with different bit numbers: (1) Without power optimization, (2) Power optimization with 2-bit flip-flops, (3) with 2/4-bit flip-flops, and (4) with 2/4/8bit flip-flops.
According to the comparisons in Table II, we can draw two curves as seen in Figure 2 to observe the relationship between the flip-flop power consumption and average net crosstalk with respect to the bit numbers of the applied MBFFs. In Figure 2(a), the flip-flop power consumption is significantly reduced when applying 2-bit or 4-bit flip-flops. However, the power reduction rate keeps decreasing when applying MBFFs with increasing bit numbers. In Figure 2(b), the crosstalk of applying 2-bit flip-flops is comparable to that of applying 1-bit flip-flops. Nevertheless, it dramatically increases when applying 4-bit and 8-bit MBFFs. Consequently, applying MBFFs with larger bit numbers may achieve only a little more power saving, but introduce much more serious crosstalk than applying MBFFs with smaller bit numbers. It is essential to mediate the trade-off between power and crosstalk when performing power optimization with MBFFs.
432
5B-2 TABLE II C OMPARISONS OF POWER CONSUMPTION AND CROSSTALK FOR DIFFERENT FLOWS IN F IGURE 1.
Circuit c1 c2 c3 c4 c5 c6 Cmp.
(1) Without Power Opt. Normalized Normalized FF Power Avg. Xtalk 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Normalized Crosstalk
Normalized Power
1 0.95 0.9 0.85
(2) Power Opt. w/ 2-bit FFs Normalized Normalized FF Power Avg. Xtalk 0.911 1.517 0.894 1.102 0.894 0.999 0.896 1.023 0.894 0.985 0.894 0.975 0.897 1.100
(3) Power Opt. w/ 2,4-bit FFs Normalized Normalized FF Power Avg. Xtalk 0.852 4.120 0.831 3.229 0.829 3.338 0.832 3.420 0.829 3.430 0.828 3.094 0.833 3.439
6
(4) Power Opt. w/ 2,4,8-bit FFs Normalized Normalized FF Power Avg. Xtalk 0.835 6.583 0.816 5.302 0.816 5.361 0.818 5.342 0.816 5.520 0.816 5.588 0.819 5.616
Pre-placed Design, MBFF Lib & Design Constraints
5
Crosstalk-Aware Flip-Flop Clustering
4 3 2
Coupling Capacitance Map Generation
Coupling Capacitance Map & Intersection Graph Update
Flip-Flop Intersection Graph Construction
More MBFFs? No Crosstalk-Aware MBFF Placement
1 0.8
0 0
2
4
6
8
0
2
Bit Number
(a)
4
6
8
Bit Number
(b)
Fig. 2. Comparisons of (a) normalized power and (b) normalized Xtalk of applying MBFFs with different bit numbers.
Optimized Design with MBFFs Fig. 3.
III. C ROSSTALK - AWARE P OWER O PTIMIZATION MBFF S
Yes
The flowchart for crosstalk-aware power optimization with MBFFs.
WITH
In addition to minimizing power consumption with MBFFs and the interconnecting wirelength while satisfying placement density and timing slack constraints as presented in [2], we further formulate the crosstalk effect of MBFFs as the coupling capacitance constraint. When replacing several flip-flops with an MBFF, the coupling capacitance on each net should be less than the maximal allowable capacitance of the net such that the signal integrity of the net can be maintained. The flowchart of our algorithms is shown in Figure 3. Given an initial placement of a design, the coupling capacitance map is first generated to indicate the degree of net coupling throughout the chip area. The flip-flop intersection graph is then constructed according to the timing constraint of each net which connects to a flip-flop.Based on the coupling capacitance map and the flip-flop intersection graph, the crosstalkaware flip-flop clustering algorithm is developed to iteratively cluster flip-flops while satisfying timing and crosstalk constraints until there is no MBFF which can be further applied to the design. When a new MBFF is generated, the coupling capacitance map and the flip-flop intersection graph should be updated accordingly. Finally, the MBFF placement algorithm optimizes the locations of all MBFFs. A. Coupling Capacitance Map Generation Similar to the coupling capacitance map in [16], [17] and the noise map in [20], which were proposed for crosstalk-aware
placement, our coupling capacitance map acts as a guide for both crosstalk-aware flip-flop clustering and MBFF placement. We note that a coupling capacitance map is different from a congestion map because optimization with a congestion map may not satisfactorily reduce coupling capacitance [16], [17], [20].
b7
b8
b9
b5
b6
n1
b4
n4
b7
n3
b4
b8
b9
b5
b6
b2
b3
n4 n3
n2
n2 b1
n1
b2
b3
(a)
b1
(b)
Fig. 4. Routing topologies of (a) routability-driven and (b) coupling-free global routing on nine rectangular bins.
Before generating the coupling capacitance map, the chip area is first equally divided into a set of rectangular bins, B = {b1 , b2 , . . . , bn }, or global routing cells, as shown in Figure 4. The coupling-free global routing [11], [12] is then applied to fast generate the routing topology of each net. Figure 4 compares the routing topologies of four nets, n1 , n2 , n3 , and n4 , based on the routability-driven global routing [1] and the coupling-free global routing [11], [12].
433
5B-2 The routability-driven global routing may generate the routing topology in either Figure 4(a) or (b) because the number of nets across each bin boundary is less than or equal to one in both cases, and the total wirelengths of both cases are also the same. However, the coupling-free global routing will only produce the routing topology in Figure 4(b) because it always minimizes the coupling capacitance among different nets. According to the resulting routing topology in Figure 4(b), the coupling capacitance map can be generated by calculating the coupling capacitance of each bin, which is the sum of the coupling capacitances on all wire segments inside the bin. For example, in Figure 4(b), the coupling capacitance of b2 , C b2 , is equal to Cnb22 , which is the capacitance on the horizontal wire segment of n2 inside b2 . Cnb22 is obtained by summing up the coupling capacitances between the parallel wire segments of n2 inside b2 and n3 inside b5 , and those of n2 inside b2 and n1 inside b8 . The coupling capacitance between two parallel wires can be calculated by Equation (2). B. Flip-Flop Intersection Graph When merging several flip-flops with an MBFF, timing constraint, which is transformed to the maximum allowable distance from a pin to its connected flip-flop, should be satisfied. For example, in Figure 5(a),the MBFF should be placed at a common feasible region of all merged flip-flops such that the timing constraints of all its connected nets are satisfied. We construct a flip-flip intersection graph to represent the relationship among the feasible regions of all flip-flops.
v1
f1 p0
f0
p1
e12
f2
v2 e23
f3
v3 (a)
(b)
Fig. 5. (a) The feasible region of the flip-flop, f0 . (b) The feasible regions of the flip-flops, f1 , f2 , and f3 , and the corresponding flip-flop intersection graph.
Figure 5(b) shows the feasible regions of the flip-flops, f1 , f2 , and f3 , and the corresponding flip-flop intersection graph, G(V, E), where each vertex, vi ∈ V , corresponds to the feasible region of a flip-flop, fi . There is an edge, eij ∈ E, between vi and vj , if there exists an intersection between the the feasible regions of fi and fj , and the total bit number of fi and fj is less than or equal to the largest MBFF bit number in the library. C. Crosstalk-aware Flip-Flop Clustering According to Section II, an MBFF with a larger bit number may introduce much more serious crosstalk than that with a smaller bit number. Therefore, we propose a crosstalk-aware
bottom-up clustering algorithm, as illustrated in Algorithm 1, for better crosstalk consideration, instead of clustering a maximum number of 1-bit flip-flops by searching the maximum clique in the flip-flop intersection graph [2], [10], [22]. The input of our algorithm is a flip-flop intersection graph, while the output is a set of non-conflicting flip-flop clusters. The algorithm starts with computing the key values of all edges in E and storing the edges in a heap, H (see Lines 4–7). The key value of an edge, eij , can be calculated by Equation (3), where Weij is the HPWL of all pins connected to fi and fj , Aeij is the area of the intersected feasible regions of fi and fj , and α and β are constant coefficients. Key(eij ) = αWeij − βAeij .
(3)
An edge with a minimum key value indicates that merging the corresponding flip-flips into an MBFF will result in shorter interconnecting wirelength, compared with merging other flipflops. A net with shorter wirelength is less likely to produce long parallel run length among different wires resulting in smaller coupling capacitance and crosstalk. In addition to shorter wirelength, the minimum key value also indicates a larger feasible region for the MBFF, which implies that there are more choices to place the MBFF resulting in minimized crosstalk, which will be further discussed in Section III-D. Algorithm 1 Crosstalk-aware Flip-Flop Clustering Input: A flip-flip intersection graph, G(V, E). Output: A set of non-conflicting flip-flop clusters. 1: E ← φ; // E is a set of clustered edges. 2: V ← φ; // V is a set of clustered vertices. 3: H ← φ; // H is a heap for all edges. 4: for all eij ∈ E do 5: ComputeKey(eij ); 6: H.push(eij ); 7: end for 8: while H = φ do 9: eij ← H.extractM in(); 10: if (vi ∩ V = φ) ∩ (vj ∩ V = φ) then 11: EstimateCrosstalk(eij ); 12: if NoCrosstalkViolation(eij ) then 13: AddFlipFlopCluster(eij ); 14: E = E ∪ eij ; 15: V = V ∪ vi ; 16: V = V ∪ vj ; 17: end if 18: end if 19: end while
After all edges are stored in the heap, H, the edge with the minimum key value will be iteratively extracted from H (see Lines 8–19). During each iteration, we consider the edge, eij , which connects vi and vj corresponding to the feasible regions of fi and fj . If fi and fj have not been clustered, we estimate the crosstalk of each net connecting to fi and fj , after they are merged into an MBFF. If there is no crosstalk constraint violation, a flip-flop cluster containing fi and fj is then added to the set of non-conflicting flip-flop cluster. To estimate the net crosstalk induced by an MBFF, we place the newly generated MBFF at a bin with the least
434
5B-2 coupling capacitance inside the feasible region, and re-route the previously removed nets by performing the coupling-free global routing. Based on the routing results, the crosstalk on each net can be re-calculated by Equations (1) and (2). After all edges in the heap are extracted, G(V, E) will be updated if there is any clustered edge in E , and Algorithm 1 will be executed again based on the new graph. When updating G(V, E), we create a super vertex for each clustered edge, eij in E , and merge its connected vertices, vi and vj , into the super vertex if the total bit number of fi and fj is less than the maximum bit number of the available MBFFs in the library. Once the super vertex is created, if there is a vertex, vk , in G(V, E) which is connected to both vi and vj , a new edge is created to connect the super vertex and vk . After the connection of the super vertex is built up, the merged vertices and all their connected edges are removed. Consequently, a new graph is obtained for the next clustering iteration. D. Crosstalk-aware MBFF Placement When searching the best bin to accommodate an MBFF inside its feasible region, all the previous works [2], [10], [22] only considered the placement density constraint, while minimizing interconnecting wirelength. Figure 6 shows the preferred region of an MBFF for wirelength minimization, which is within the median coordinates of the fan-in and fanout gates of the MBFF. If the preferred region is not inside the feasible region of the MBFF, the MBFF will be placed in a bin which is close to the preferred region. chip area bin fan-in/fan-out gate of an MBFF feasible region of an MBFF preferred region of an MBFF
Fig. 6.
The preferred region of an MBFF for wirelength minimization.
In addition to the placement density constraint, we additionally consider the crosstalk constraint when searching for the best placement bin and grid for an MBFF. For each bin leading to the minimized wirelength, we first estimate the induced crosstalk based on the coupling capacitance map, as seen in Figure 4. If the crosstalk constraint is not violated, we analyze the local pin and wire densities inside and around the best placement bin which results in the shortest wirelength. The MBFF is then placed at a legal position with the minimal the pin and wire densities inside the bin. Based on our crosstalk-aware MBFF placement algorithm, both wirelength and crosstalk can be minimized.
on six industrial circuits with the MBFF library, as seen in Tables I. We compared the numbers of flip-flops, power ratios, wirelength ratios, average crosstalk ratios, and CPU times for four different approaches, including Chang et al.’s approach [2], INTEGRA [10], Wang et al.’s approach [22] and ours, as shown in Tables III and IV. Table III lists the names of the benchmark circuits (“Cir.”), the numbers of (“# of”) 1-bit, 2-bit, and 4-bit flip-flops resulting from the four approaches. Compared with [2] and [22], the results of our approach contain 47–55% less 1-bit flipflops, 3.36–4.02X more 2-bit flip-flops, and 43–45% less 4-bit flip-flops in average. Compared with [10], the results of our approach contain 200X more 1-bit flip-flop, 25X more 2-bit flip-flops, and 54% less 4-bit flip-flops in average. In summary, our approach may adopt much less 4-bit flip-flops, much more 2-bit flip-flops because 4-bit flip-flops may introduce much more serious crosstalk than 1-bit and 2-bit flip-flops. Table IV compares the ratios between the corresponding data before and after applying the proposed algorithms, including the power ratios (“Pwr. Ratio”), the wirelength ratios (“WL Ratio”) based on the HPWL model, and the average crosstalk ratios (“Xtk. Ratio”), and the CPU times (“Time(s)”) for the four approaches. According to the results in Table IV, the power consumption based on our approach is comparable to those based on all the previous work [2], [10], [22]. The interconnecting wirelength based on our approach is 10% shorter than that based on [10] and comparable to those based on [2], [22]. It should be noted that our approach leads to significant crosstalk reduction compared with the previous works. The average net crosstalk based on our approach is 35%, 51%, and 23% less than those based on [2], [10], [22], respectively. The CPU time based on our approach is longer because we additionally performed global routing and constructed/updated the coupling capacitance map for crosstalk minimization and avoidance, which had been known to be one of the the most time-consuming parts in physical synthesis. Consequently, our approach is very effective in crosstalk avoidance for power optimization with MBFFs. V. C ONCLUSIONS In this paper, we have addressed the crosstalk effect when applying MBFFs for clock power saving. We have observed the ineffectiveness of applying MBFFs with larger bit numbers due to serious crosstalk based on the crosstalk evaluation. We have also proposed a new problem formulation and novel algorithms to solve the addressed problem. Experimental results have shown that our approach is very effective in crosstalk avoidance for power optimization with MBFFs. VI. ACKNOWLEDGMENTS
IV. E XPERIMENTAL R ESULTS We implemented our algorithms in the C++ programming language with STL on a 2.66GHz Intel Core i7 PC under the Linux operation system. We empirically tested our approach
This work was partially supported by NSC of Taiwan under Grant No’s. NSC 098-2218-E-194-008-MY3, NSC 100-2220E-194-007-, and NSC 100-2220-E-194-002-.
435
5B-2 TABLE III C OMPARISONS OF THE NUMBERS OF 1- BIT, 2- BIT, AND 4- BIT FLIP - FLOPS FOR FOUR DIFFERENT APPROACHES , [2], [10], [22], AND THIS WORK .
Cir.
c1 c2 c3 c4 c5 c6 Cmp.
Chang et al., ICCAD’10 [2] # of # of # of 1-bit 2-bit 4-bit FFs FFs FFs 8 10 23 24 36 96 84 146 386 242 469 1175 480 920 2420 7320 14780 38780 2.189 0.298 1.763
Jiang et # of 1-bit FFs 0 0 0 2 2 18 0.005
al., ISPD’11 [10] # of # of 2-bit 4-bit FFs FFs 4 28 6 117 14 473 21 1459 33 2983 113 47939 0.040 2.167
Wang et al., ISPD’11 [22] # of # of # of 1-bit 2-bit 4-bit FFs FFs FFs 6 7 25 16 30 101 70 125 400 232 402 1211 484 806 2476 7580 13508 39351 1.874 0.249 1.832
# of 1-bit FFs 4 8 36 124 240 3960 1
This Work # of 2-bit FFs 28 128 514 1600 3208 51238 1
# of 4-bit FFs 15 54 214 639 1336 21391 1
TABLE IV C OMPARISONS OF POWER RATIOS , WIRELENGTH RATIOS , AVERAGE CROSSTALK RATIOS , AND CPU TIMES FOR FOUR DIFFERENT APPROACHES , [2] ( ON I NTEL C ORE I 7 2.66GH Z ), [10] ( ON I NTEL X EON 3.8GH Z ), [22] ( ON I NTEL X EON 2.13GH Z ), AND THIS WORK ( ON I NTEL C ORE I 7 2.66GH Z ).
Cir. c1 c2 c3 c4 c5 c6 Cmp.
Chang et al., ICCAD’10 [2] Pwr. WL Xtk. Time Ratio Ratio Ratio (s) 0.852 0.917 4.120 0.01 0.831 0.947 3.229 0.04 0.829 0.948 3.338 0.10 0.832 0.945 3.420 0.28 0.829 0.949 3.430 0.60 0.828 0.949 3.094 78.92 0.971 1.024 1.537 0.087
Jiang et al., ISPD’11 [10] Pwr. WL Xtk. Time Ratio Ratio Ratio (s) 0.828 0.964 6.146 0.01 0.809 1.020 3.730 0.01 0.808 1.036 4.169 0.01 0.810 1.041 4.349 0.02 0.807 1.048 4.416 0.05 0.807 1.053 4.552 1.11 0.945 1.116 2.019 0.023
Wang et al., ISPD’11 [22] Pwr. WL Xtk. Time Ratio Ratio Ratio (s) 0.844 0.918 3.476 0.01 0.825 0.889 2.666 0.05 0.826 0.885 2.917 0.22 0.829 0.885 2.858 0.72 0.827 0.866 2.656 1.89 0.827 0.882 2.807 36.12 0.967 0.964 1.295 0.094
Pwr. Ratio 0.869 0.855 0.855 0.859 0.856 0.856 1
This WL Ratio 0.920 0.916 0.920 0.919 0.923 0.925 1
Work Xtk. Ratio 3.459 2.181 2.015 2.072 2.016 2.007 1
Time (s) 0.09 0.63 2.74 8.87 18.80 320.42 1
R EFERENCES
[14] A. Krishnamoorthy, “Minimize IC power without sacrificing performance,” EE Times, 2004.
[1] Z. Cao, T. Jing, J. Xiong, Y. Hu, L. He, and X. Hong, “Dprouter: A fast and accurate dynamic-pattern-based global routing algorithm,” Proc. ASPDAC, pp. 256–261, 2007.
[15] D. Liu and C. Svensson, “Power consumption estimation in CMOS VLSI chips,” IEEE J. Solid-State Circuits, vol. 29, no. 6, pp. 663–670, Jun. 1994.
[2] Y.-T. Chang, C.-C. Hsu, M. P.-H. Lin, Y.-W. Tsai, and S.-F. Chen, “Postplacement power optimization with multi-bit flip-flops,” Proc. ICCAD, pp. 218–223, 2010.
[16] J. Lou and W. Chen, “Cross talk driven placement,” Proc. ASPDAC, pp. 73–740, 2003.
[3] Z.-W. Chen and J.-T. Yan, “Routability-driven flip-flop merging process for clock power reduction,” Proc. ICCD, pp. 203–208, 2010. [4] L. Chen, A. Hung, H.-M. Chen, E. Tsai, S.-H. Chen, M.-H. Ku, and C.-C. Chen, “Using multi-bit flip-flop for clock power saving by DesignCompiler,” Proc. SNUG, 2010.
[17] ——, “Crosstalk-aware placement,” IEEE Des. Test. Comput., vol. 21, no. 1, pp. 24 – 32, 2004. [18] Y. Lu, C. N. Sze, X. Hong, Q. Zhou, Y. Cai, L. Huang, and J. Hu, “Navigating registers in placement for clock network minimization,” Proc. DAC, pp. 176–181, 2005.
[5] Y. Cheon, P.-H. Ho, A. B. Kahng, S. Reda, and Q. Wang, “Power-aware placement,” Proc. DAC, pp. 795–800, 2005.
[19] R. Pokala, R. Feretich, and R. McGuffin, “Physical synthesis for performance optimization,” Proceedings of IEEE International ASIC Conference and Exhibit, pp. 34–37, Sep. 1992.
[6] M. Donno, A. Ivaldi, L. Benini, and E. Macii, “Clock-tree power optimization based on RTL clock-gating,” Proc. DAC, pp. 622–627, 2003.
[20] H. Ren, D. Pan, and P. Villarubia, “True crosstalk aware incremental placement with noise map,” Proc. ICCAD, pp. 402–409, 2004.
[7] F. Emnett and M. Biegel, “Power reduction through RTL clock gating,” Proc. SNUG, 2000.
[21] T. Sakurai and K. Tamaru, “Simple formulas for two- and threedimensional capacitance,” IEEE Trans. Electron Devices, vol. 30, no. 2, pp. 183–185, Feb. 1983.
[8] F.-Y. Fan, H.-M. Chen, and I.-M. Liu, “Technology mapping with crosstalk noise avoidance,” Proc. ASPDAC, pp. 319–324, 2010.
[22] S.-H. Wang, Y.-Y. Liang, T.-Y. Kuo, and W.-K. Mak, “Power-driven flip-flop merging and relocation,” Proc. ISPD, 2011.
[9] W. Hou, D. Liu, and P.-H. Ho, “Automatic register banking for lowpower clock trees,” Proc. ISQED, pp. 647–652, 2009.
[23] K. Wang and M. Marek-Sadowska, “Buffer sizing for clock power minimization subject to general skew constraints,” Proc. DAC, pp. 159– 164, 2004.
[10] I. H.-R. Jiang, C.-L. Chang, Y.-M. Yang, E. Y.-W. Tsai, and L. S.-F. Chen, “INTEGRA: Fast multi-bit flip-flop clustering for clock power saving based on interval graphs,” Proc. ISPD, 2011. [11] R. Kastner, E. Bozorgzadeh, and M. Sarrafzadeh, “An exact algorithm for coupling-free routing,” Proc. ISPD, pp. 10–15, 2001. [12] ——, “Pattern routing: Use and theory for increasing predictability and avoiding coupling,” IEEE Trans. Computer-Aided Design, vol. 21, no. 7, pp. 777–790, Jul. 2002. [13] Y. Kretchmer, “Using multibit register inference to save area and power,” EE Times Asia, 2001.
[24] J. G. Xi and W.-M. Dai, “Buffer insertion and sizing under process variations for low power clock distribution,” Proc. DAC, pp. 491–496, 1995. [25] J.-T. Yan and Z.-W. Chen, “Construction of constrained multi-bit flipflops for clock power reduction,” IEEE International Conference on Green Circuits and Systems, pp. 675–678, 2010. [26] H. Zhou and D. Wong, “Global routing with crosstalk constraints,” IEEE Trans. Computer-Aided Design, vol. 18, no. 11, pp. 1683–1688, Nov. 1999.
436