Effective Linear Programming Based Placement Methods Sherief Reda
Amit Chowdhary
CSE Department University of California, San Diego La Jolla, CA 92093
Design Technology Solutions Intel Corporation Santa Clara, CA 95054
[email protected]
[email protected]
ABSTRACT Linear programming (LP) based methods are attractive for solving the placement problem because of their ability to model Half-Perimeter Wirelength (HPWL) and timing. However, it has been technically difficult to model overlaps in LP. This difficulty in modeling overlaps restricted the domain of LP-based methods to incremental placers, where LP is used to calculate the optimal locations of a small subset of cells with no regard to overlaps. In this paper, we enlarge the scope of LP-based methods from just operating on a small subset of cells to operating on all cells of a functional block circuit. We show how to model, reduce and prevent overlaps in LP-based placement flows. We use our ideas to construct (1) a global optimal whitespace allocator, and (2) a global overlap remover and cell spreader. We also modify our methods to fit in a timing-driven placement flow. Compared to our default industrial flow, our results show an improvement by an average of 7.64% in wirelength, and by an average of 21% in total negative slack. Furthermore, we conduct a benchmarking study, where we surprisingly show that academic placers fail to consistently produce good results on relatively small functional blocks. Categories and Subject Descriptors: B.7.2 [Design Aids]: Placement and routing. General Terms: Algorithms, Measurement, Performance. Keywords: linear programming, relative placement, whitespace management, timing-driven placement.
1.
INTRODUCTION
Hierarchical physical design methodologies continue to play an important role in state-of-the-art microprocessor design projects. This is probably for a number of reasons including: (1) it is easier to handle ECOs than in flat physical design, (2) physical design can proceed in parallel once the hierarchy is created, (3) the iterative nature of the design process would require tremendous amount of turn-around-time with
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISPD’06, April 9–12, 2006, San Jose, California, USA. Copyright 2006 ACM 1-59593-299-2/06/0004 ...$5.00.
flat methods, and (4) it gives better management of the clock tree structures. The hierarchical design methodology, with its relatively small-size circuits, or Functional Unit Blocks (FUBs), shifts the focus only on performance and not scalability. In flat placement, the main recent drives in research – besides improving general performance – have focused on scalability and handling large-scale requirements such as the presence of fixed blocks or simultaneous placement of cells and blocks [15, 8, 7, 2, 19], where the placer can be regarded as performing floorplanning and placement simultaneously [1]. The recent release of the IBM ISPD’05 benchmark [16] exemplifies these requirements. While these requirements play a quite important role in multi-million gate flat designs, they are of less importance in designs extracted from hierarchical methods. Linear Programming (LP) has been shown to model both HPWL [13] and timing using static timing analysis constraints with a high degree of accuracy [6]. For instances arising in hierarchical design methodologies, LP typically requires practical runtime. In this paper: • We extend LP methods to model cell overlap removal, thus demonstrating for the first time how to efficiently solve global placement problems using LP. • We give a direct confirmation that maintaining relative cell ordering is indeed a good spreading strategy [20]. • We extend the scope of LP-based methods from merely relaxation-based - that is working on a small subset of cells - to large-scale methods that can optimize the entire placement whether for wirelength or timing. • We show significant improvements in both wirelength (7.64%) and timing (21%) using our LP-based methodology. • We cover benchmarking methodologies [7, 2, 16, 14]. We surprisingly show that some current academic placers do not produce good placements for small-scale instances. These instances are parts of current microprocessors that are ubiquitous in our daily lives. The organization of this paper is as follows. Section 2 describes HPWL modeling using LP. Section 3 describes our cell overlap modeling method. We discuss possible uses of our method whether for global overlap reduction, or for global whitespace distribution. In Section 4, we extend our method for timing-driven placement. Section 5 gives all experimental results for both timing and wirelength. We also benchmark a number of academic placers on a variety of microprocessor units. Finally, Section 6 highlights the main contributions for this work and gives a number of future directions.
2.
WIRELENGTH (HPWL) MODELING
One of the most attractive features of LP is its ability to model HPWL exactly [13]. To develop LP-based methods for whitespace allocation or cell spreading: (1) we show how to model HPWL, (2) how to capture the relative placement in an input placement, (3) how to translate the relative positions to constraints in the LP. We start by HPWL modeling. Each cell u is characterized by a width wu and a height hu , and its center position is (xu , yu ). For each cell u, four constraints are added to limit its movement within the FUB’s layout area. xu ≥ f ublef t + w2u yu ≥ f ubbottom + h2u
xu ≤ f ubright − w2u yu ≤ f ubtop − h2u
Each net j has four variables lef tj , rightj , lowerj and upperj corresponding to its four boundary edges. We add to the LP four constraints for every object u ∈ j: lef tj ≤ xu − w2u upperj ≤ yu + h2u
rightj ≥ xu + w2u lowerj ≤ yu − h2u
where the HPWL of net j is half the perimeter of the bounding box enclosing all the cells of the net. It is straightforward to adjust the previous inequalities for pin-to-pin HPWL by adding the appropriate pin-offsets. Since every net has a lower bound on its HPWL – basically if we abut all of its cells –, we add the following1 :
rightj − lef tj + upperj − lowerj ≥ 2
wu × h u
(1)
u∈j
Finally, the objective function is given by:
min
rightj − lef tj + upperj − lowerj ,
(2)
j∈N
where N is the set of nets of the design. We next discuss how to model cell overlaps.
3.
CELL OVERLAP MODELING
After modeling HPWL, we now show how to (1) find out the relative order relationship between all cells, and (2) translate the relative order relationships to constraints in the LP, and create constraints for overlap reduction and prevention.
3.1 Capturing the Relative Positions of Cells The problem of calculating relative positions in one dimension is trivial: an object is either to the left or right of another object in case of horizontal 1-D. For the twodimensional case, we can define for each object u, four sets Lu , Ru , Uu , and Bu corresponding to the sets of objects to 1 Notice that the problem of determining the least possible HPWL of a net with arbitrary shaped objects is NP-hard. This can be seen by reducing the general floorplan problem to it. An optimal floorplan of a set of objects has a bounding box which is equivalent to the least HPWL of the net connecting all of the objects. This can be solved using integer linear programming [18]. The problem is, however, quite simple in case of unit-size objects [5].
Figure 1: Calculating the closest points from a given point.
the left, right, above, and below u respectively. This approach not only requires quadratic space and runtime complexity to calculate, but also might be too restrictive since there is little point in defining a relative order relationship between a pair of cells that are quite far from each other. It is possible to reduce the space complexity by making use of transitive relationships. For example, if v is to the left of u, and w is to the left of v, then we can only add v to Lu , and w to Lv , and make use of transitivity to deduce that w is to the left of u. This approach reduces the space complexity to linear, but would still require quadratic runtime complexity [11]. The main question we address in this section is: is it possible to have a fast heuristic method to calculate the relativeorder relationships? We propose the use of two fast methods that augment each other. The first method, Q-adjacency, is a simple technique we suggest, and the second method is based on the Delaunay triangulation. For our purposes, we consider each cell represented by a point that is located at its center. Delaunay triangulation: Given the positions of a set P of points, the Delaunay triangulation is a set of lines connecting each point to its neighbors. Two points pi and pj are neighbors if and only if there is a closed disk that contains pi and pj on its boundary and does not contain any other points of P . We use a publicly available Delaunay triangulation package [10]. The runtime complexity of the Delaunay triangulation is O(|P | log |P |). Q-adjacency graph: Given the positions of a set P of points, we calculate the closest point for each point pi ∈ P in each of pi ’s four quadrants (top-right, top-left, bottomleft, and bottom-right) as shown in Figure 1. Thus, each point has up to four outgoing relationships with respect to its neighbors. Notice that this relationship is not necessarily reversible, i.e., if pi is the closest top-right point to pj then this does not necessarily imply that pj is the closest bottom-left point to pi . Figure 2 gives a simple pseudocode for calculating the top-right adjacent neighbor for every point. It is straightforward to extend the algorithm to handle other neighbors. The algorithm runtime requirements is O(|P | log |P |+K|P |), where K depends on how many points need to be examined before stopping at line 5. Empirically, we have found that the algorithm runtime is dominated by the sorting stage. In our experimental setup, we have found that using the combined graph from both Q-adjacency graph and the Delaunay triangulation graph leads to better and faster results than using only one of them. This is because neither of the Delaunay triangulation nor the Q-adjacency graph provided, each on its own, sufficient relative-order relationships
v
Input: A set P of two dimensional data points. Output: A set of edges E representing the q-adj graph.
u v
u
1. sort all points P vertically
(c) v
2. for each point pi in order:
u
(a)
3. let best = ∞ 4. for each point pj , where j > i: 5. if pj .y > best break 6. let d = pj .x − pi .x + pj .y − pi .y 7. if pj .x > pi .x and d < best then best = d and pb = j 9. declare pb as pi ’s closest object in the top-right quadrant.
Figure 2: Q-adjacency diagram.
(b)
Figure 3: Reducing overlaps by increasing the separation between the centers of cells.
6000
6000
6000
5000
5000
5000
4000
4000
4000
3000
3000
3000
2000
2000
2000
1000
1000
0
for overlap reduction and prevention. Constructing both graphs and combining them takes O(|P |log|P |+K|P |), since the number of edges is linear in the number of points.
3.2 Translating the relative order relationships to LP constraints After constructing the combined adjacency graph, we translate each edge {u, v} in it to a number of constraints that guarantee that (1) the relative positions of objects u and v stay the same with respect to each other, and (2) if u and v overlap then the amount of overlap reduces; or otherwise, they stay non overlapped. Let (x0u , yu0 ) and (x0v , yv0 ) denote the current center locations of objects u and v respectively. Suppose that x0v ≥ x0u and that yv0 ≥ yu0 , i.e., v lies to the top-right of u.2 Then there are two possible cases: Case 1. u and v overlap: In this case, we want the distance separating u and v to increase to reduce the amount of overlap. We require that the next separation is equal to the current separation plus the minimum amount of displacement required to remove the overlap. Mathematically, this translates to:
1000
0 0
2000
4000
6000
8000
10000
12000
(a) Input relative placement.
14000
0 0
2000
4000
6000
8000
10000
12000
(b) Placement after MidX spreading.
14000
0
2000
4000
6000
8000
10000
12000
14000
(c) Placement after legalization.
Figure 4: Illustration of different steps of our LPbased placement flow. Case 2. u and v do not overlap: In this case, we want to make sure u and v stay non-overlapped in any future placement. We here again assume that v is at an upperright location with respect to u. xv − x u + y v − y u xv − x u yv − y u
hu + h v wu + w v + 2 2 ≥ 0 ≥ 0 ≥
(6) (7) (8)
Constraints 7 and 8 ensures that the relative positions between the two cells stay the same. Constraint 6 ensures that the two cells stay non overlapped. With the HPWL and overlap reduction/prevention both modeled, we can run the LP on a given placement and optimize with respect to HPWL. This can be used in two setwu + wv 0 hu + hv tings: (1) if the input given placement is legal then our 0 xv − xu + yv − yu ≥ min(yv0 − yu + , xv − x0u + (3)) 2 2 methods will “re-shape” the placement to distribute whitesxv − x u ≥ 0 (4) pace for sake of wirelength optimization, and (2) if the inyv − yu ≥ 0 (5) put placement is highly overlapped then our method acts as a global whitespace allocator and a global cell spreading Constraints 4 and 5 ensure that the relative positions beunder the constraint of keeping relative cell positions the tween the u and v stay the same. Constraint 3 ensures that same. For example, Figure 4 shows how our method can the amount of overlap reduces after solving the LP. For extake a heavily overlapped relative placement and spread it. ample, suppose we have the overlapping condition of Figure We refer to our tool by MidX. We will exploit both settings 3.a; then applying the previous constraints increases the disin the experimental section. We now move to timing-driven tance between the centers of the two cells. This can lead to placement. two possible results: (i) complete overlap removal as in 3.b, or (ii) a reduction in the amount of overlap as in 3.c. If 4. TIMING-DRIVEN PLACEMENT result 3.c occurs then the LP can be again constructed and Given an input legal or a non-legal relative placement, solved iteratively until the overlap is removed. However, reour LP-based timing optimization works in two phases. In gardless of which result occurs, we can see that there is a the first phase, the relaxed timing-optimal placement on the reduction in amount or area of overlap. entire FUB is calculated, and in the second phase, we use a modified version of our LP formulation to reduce or elimi2 nate the overlaps. Similar construction for other orientations, bottom-right, bottom-left, top-left are straightforward to deduce. We model timing in LP using the techniques of [6]. We
omit the details of LP modeling for lack of space. However, We stress out that contrary to the approach of [6] which works on a few cells, we work on all cells, finding their optimal relaxed locations. After executing a relaxed placement on the entire FUB, we spread the placement in a way that conserve both wirelength and timing. Instead of just minimizing the total HPWL, we minimize the total HPWL plus the distance moved by the critical cells. Thus if C is the set of critical cells, we attempt to minimize j∈N rightj − lef tj + upperj − lowerj + 0 0 v∈C K|xv − xv | + K|yv − yv |, where K is some appropriate constant. To model the absolute terms | · | in LP, we introduce for each critical cell two variables, δxv and δyv , and add the following constraints: xv − x0v −xv + x0v yv − yv0 −yv + yv0
≤ ≤ ≤ ≤
j∈N
Kδxv +Kδyv v∈C
(9) Solving the previous LP, together with overlap reduction constraints of Section 4, spreads cells while minimizing both wirelength and the distance moved by the top timing-critical cells.
5.
Bench
Def.
WFUB01 WFUB02 WFUB03 WFUB04 WFUB05 WFUB06 WFUB07 WFUB08 WFUB09 average
88.0 58.2 29.2 7.95 88.3 107 68.0 77.2 90.7
(rightj −lef tj +upperj −lowerj )+
min
Cells 21205 12175 8260 1454 16419 19121 15334 18724 16783 2785 3477 7203 8435 19339
Nets 21901 12780 8542 1935 20002 22737 15960 19525 17817 3478 2319 2189 7909 20350
IO Pads 9708 7851 7522 2214 8404 15073 9271 9219 16473 3142 3757 7581 9312 5633
WS(%) 53% 55% 36% 52% 49% 21% 27% 43% 54% 60% 39% 71% 66% 78%
Table 1: Benchmark Characteristics. Cells is the number of single and double height cells in each FUB. Nets is the number of nets. IO Pads total number of I/O pads. WS is the whitespace percentage.
δxv δxv δyv δyv
The objective function then becomes:
Bench WFUB01 WFUB02 WFUB03 WFUB04 WFUB05 WFUB06 WFUB07 WFUB08 WFUB09 TFUB01 TFUB02 TFUB03 TFUB04 TFUB05
EXPERIMENTAL RESULTS
We use nine circuits or FUBs for wirelength experiments, and five FUBs for timing-driven results. The characteristics of these FUBs are given in Table 1. While our FUBs contain only standard cells, a few percentage of these cells have double heights. The FUBs also have varying amount of whitespace from 21% – 71%. Another distinguishing feature of these FUBs is their relatively small size in comparison to full ASICs. This is, however, expected in hierarchical designs where circuits or FUBs complement each other’s role in the design. For our experiments, we use Intel Xeon-based workstations, and we use CPLEX 8.00 – integrated in our placer – as our LP solver. Flow I - Global Whitespace Allocation: In the first series of experiments, we assess the use of our method for global whitespace distribution. Our original default flow is based on running a global placer, followed by legalization and detailed placement. The input to our LP-based tool, MidX, is the legalized placement from the global placer. After MidX, the new placement is legalized to remove any few overlaps that may result. The new legalized placement is then fed to a detailed placer. We can view the use of our method in this flow as an additional step between global and detailed placement. In Table 2 we report the wirelength results of our default flow and our modified flow. The introduction of this additional step has improved wirelength by an average of about 4.10%. Flow II - Large-Scale Overlap Reduction: In the second series of experiments, we assess the use of our method for simultaneous global overlap removal and whitespace allocation. Toward this goal, we run our global placer - which
Flow I New Imp. 83.6 5.00% 54.7 7.73% 29.0 0.68% 7.50 5.66% 87.7 0.68% 104 3.74% 66.9 1.62% 74.7 7.57% 86.8 4.30% 4.10%
Flow II New Imp 74.0 15.91% 53.4 8.25% 27.9 4.45% 7.34 7.34% 86.7 1.81% 97 9.70% 65.4 6.54% 69.2 6.92% 79.0 7.90% 7.64%
Table 2: Using our methods for global whitespace allocation. Def. are our default flow results. New is our new flow (flow I) results. Imp gives the percentage improvement. is a force-directed placer [9]3 - for a few iterations to establish a relative ordering of cells, and until the maximum cell density reaches 10, where the cell density is calculated by constructing a grid over the layout area and dividing the area of cells inside each grid block by the capacity of the grid block. Then our LP method is executed iteratively as long as the maximum cell density decreases, or the density reaches 1. After that, legalization and detailed placement follow. The results of the new flow and our default flow are reported in Table 2. The use of our LP-based technique have successfully managed to reduce the overlaps and improve our overall flow results by about 7.64%. These results directly demonstrate and confirm that spreading cells while maintaining their relative positions is a good overlap-reduction strategy4 . 3 Our attempts to use LP instead of a force-directed placer to establish the initial relative order did not lead to good final results, since there were tremendous number of cells residing at the same location with not “sufficient” relative order. 4 Note that detailed placement will certainly change the relative order of cells in a local fashion. However, maintaining the overall order in a global fashion is a good strategy and is not restrictive.
Benchmarking Methodologies and Comparisons: In the third series of experiments we surprisingly show that academic placers are not capable of yielding consistent good results on small functional blocks (FUBs), which are part of microprocessors used in our daily-life devices. To ensure a fair benchmarking framework, we first modify our FUBs by projecting their I/O pads to the FUB periphery. This is necessary because area I/O pins cannot be modeled in the BOOKSHELF format and will be considered obstacles by academic placers. Second, we use a single tool (the bookshelf utility) to report the pin-to-pin HPWL of all placements, whether for our placer or academic tools5 . We use Flow II as our reference point. For detailed placement, we use both (1) cell reordering [4], and (2) optimal interleaving [12]. In addition to our placer, we use the following academic placers: Dragon 4.01 [19]6 ; the better of mPL5.0 and mPL4.0 on each individual benchmark [8]7 ; the better of FengShui5.1 and FengShui2.6 on each individual benchmark [3]; Capo9.3 [17]8 ; and APlace2.0 [15]9 . The results of our experiments are given in Table 3. Note that due to the projection of area I/O pins, our results in Table 3 are different from our results in Table 2. The average results are in comparison to our flow. Some of the placer failed – labeled by “fail” in the corresponding entries – to complete by crashing during execution. On the average, we find that, except for APlace, our flow provides the best average results. APlace is better on the average than our placer by a small difference of only 1.34%. Our placer gives better results than APlace on two FUBs. Following APlace and our placer, the relative ranking of placers is: Capo, Dragon, mPL, and FengShui. We also give the placement layouts of different placers in Figure 5. We can see that different placers generate a variety of placement outlines, depending on how they distribute whitespace. From our results, we conclude that: (1) placers fail to consistently obtain good results on FUBs with much smaller sizes, and with less “challenging” structures, than the traditional publicly-available large-scale IBM benchmarks; (2) the relative performance of placers on the FUBs is different from their relative performance on IBM ISPD’05 placement contest benchmarks; and (3) whitespace management techniques need further improvement. Flow III - Timing-Driven Placement Optimization: The input to our timing-optimization method is a netlist and a placement that have been already optimized in our global optimization flow (GlobalOpt) using small-scale LP-based timing optimization techniques [6], as well as, gate sizing and buffering operations. The input netlist and placement were thought to be our best possible results. Our approach proceeds in two phases. In the first phase, we find the relaxed optimal timing placement of the entire FUB, and in the second phase, we expand the placement to reduce the overlaps in a timing-driven mode as explained in Section 4. We follow this by a legalization step to remove any overlaps. We give the results of our approach in Table 4. In Table 4, we give the results from three stages: the ini5 The double-height cells are treated as small movable blocks by academic placers. 6 We found a few overlaps in Dragon final placements. 7 mPL6.0 is not yet publicly available. 8 Capo9.3 is a release subsequent to the contest release. 9 The results of APlace are courtesy of the authors of [15].
tial placement, after the relaxed placement, and after our optimization and legalization. For each stage, we report the number of critical paths (CP), the total negative slack (TNS), and the worst negative slack (WNS). Obviously, we expect the relaxed placement to have better timing, but plagued with overlaps, than the input placement. The final new placement shows much better timing than the input placement. The average improvement in TNS is about 21%. We also find good improvements in both WNS and the number of critical paths. Runtime Requirements: Finally we give the runtime requirements for our methods. For the largest benchmark WFUB01, one iteration of LP solving using CPLEX takes about 568 seconds. Thus, the whitespace allocation flow (Flow I) requires one iteration of 568 seconds. The global overlap removal flow (Flow II) usually takes between 3-4 LP iterations (2722 seconds) to complete. The global timingdriven optimization flow (Flow III) takes two iterations – one for getting the relaxed placement, and one for timing-driven spreading – to complete which translates to 1136 seconds. While these runtime requirements are practical for instances enountered in hierarchical design, runtime will certainly be a bottleneck for regular flat designs.
6. CONCLUSIONS In this paper we have examined a number of placement problems. We have extended the scope of LP-based optimization techniques from working on few cells to global entire circuits. Our LP-based method iteratively reduces and prevents overlaps until any desired cell density is achieved. We have used our LP-based as a global whitespace allocator, and as a global overlap removal, leading to average improvements of about 4.10% and 7.64% respectively. We have also proposed a timing-driven version of our LP-based method and used it as a global timing optimizer. This leads to an average improvement of about 21% in TNS, as well as significant improvements in both worst negative slack and number of critical paths. We have also conducted a benchmarking methodologies experiment on a number of state-of-the-art microprocessor functional blocks. We surprisingly show that placers fail to get consistently good results. Thus, it is not clear that placement of small-scale designs is a well-solved problem. We believe that benchmarking reporting should continue on both new and old (smaller) public benchmarks.
7. REFERENCES [1] S. N. Adya, S. Chaturvedi, J. A. Roy, D. A. Papa, and I. L. Markov, “Unification of Partitioning, Placement and Floorplanning,” in Proc. IEEE International Conference on Computer Aided Design, 2004, pp. 550–557. [2] S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, and P. H. Madden, “Benchmarking for Large-Scale Placement and Beyond,” in Proc. ACM/IEEE International Symposium on Physical Design, 2003, pp. 95–103. [3] A. Agnihotri, S. Ono, and P. Madden, “Recursive Bisection Placement: Feng Shui 5.0 Implementation Details,” in Proc. ACM/IEEE International Symposium on Physical Design, 2005, pp. 230–232. [4] A. E. Caldwell, A. B. Kahng, and I. L. Markov, “Optimal Partitioners and End-case Placers for Standard-cell Layout,” IEEE Transactions on
gplace_dp HPWL= 8.058e+07, #Cells= 27951, #Nets= 19525
capo HPWL= 8.235e+07, #Cells= 27951, #Nets= 19525
mPL_plot HPWL= 8.81e+07, #Cells= 27951, #Nets= 19525
aplace HPWL= 8.016e+07, #Cells= 27951, #Nets= 19525
fs51 HPWL= 8.923e+07, #Cells= 27951, #Nets= 19525
dragon HPWL= 8.495e+07, #Cells= 27951, #Nets= 19525
40000
40000
40000
40000
40000
40000
35000
35000
35000
35000
35000
35000
30000
30000
30000
30000
30000
30000
25000
25000
25000
25000
25000
25000
20000
20000
20000
20000
20000
20000
15000
15000
15000
15000
15000
15000
10000
10000
10000
10000
10000
5000
5000
5000
5000
5000
0
0 0
10000
20000
(a) (804).
30000
40000
50000
60000
0 0
Ours
10000
20000
30000
40000
50000
60000
(b) Capo (823).
0 0
10000
20000
30000
(c) (881).
40000
50000
60000
mPL
10000 5000
0 0
10000
20000
30000
40000
50000
(d) APlace (802).
60000
0 0
10000
20000
30000
40000
50000
60000
(e) FengShui (891).
0
10000
20000
30000
40000
50000
60000
(f) Dragon (849).
Figure 5: Placement layouts of difference placers. Bench WFUB01 WFUB02 WFUB03 WFUB04 WFUB05 WFUB06 WFUB07 WFUB08 WFUB09 Average
OUR 888 574 326 94 106 114 693 804 111 0.00%
Capo9.3 927 562 325 93 109 123 673 821 106 -0.71%
FS5.1 fail fail 456 131 136 117 765 891 190 -28.93%
APlace2.0 882 539 fail fail 107 118 652 802 110 1.34%
mPL5.0 1053 651 349 128 110 119 699 872 117 -10.90%
Dragon4.0 947 fail 339 fail 113 fail 714 849 106 -3.56%
Table 3: Wirelength results of different placers. Bench TFUB01 TFUB02 TFUB03 TFUB04 TFUB05
CP 143 34 74 540 1694
GlobalOpt TNS WNS -2.19 -0.044 -0.72 -0.054 -0.63 -0.025 -13.99 -0.094 -35.00 -0.908
CP 121 31 64 414 1442
Relaxed TNS WNS -1.42 -0.033 -0.55 -0.048 -0.54 -0.025 -8.82 -0.075 -26.0 -0.908
Timing-driven MidX + Leg CP TNS WNS 132 -1.94 -0.042 32 -0.64 -0.049 64 -0.55 -0.027 428 -10.34 -0.076 1532 -27.00 -0.855
Table 4: Average improvement in TNS = 21%. CP is the number of critical paths. TNS is the total negative slack. WNS is the worst negative slack.
[5]
[6] [7]
[8]
[9] [10] [11]
[12]
[13]
Computer-Aided Design of Integrated Circuits and Systems, vol. 19(11), pp. 1304–1313, 2000. C. Chang, J. Cong, and M. Xie, “Optimality and Scalability Study of Existing Placement Algorithms,” in Proc. IEEE Asia and South Pacific Design Automation Conference, 2003, pp. 621–627. A. Chowdhary et al., “How Accurately Can we Model Timing in a Placement Engine?” in Proc. ACM/IEEE Design Automation Conference, 2005, pp. 801–806. J. Cong, T. Kong, J. R. Shinnerl, M. Xie, and X. Yuan, “Large-Scale Circuit Placement: Gap and Promise,” in Proc. IEEE International Conference on Computer Aided Design, 2003, p. to appear. J. Cong, J. R. Shinnerl, M. Xie, T. Kong, and X. Yuan, “Large-Scale Circuit Placement,” ACM Transactions on Design Automation of Electronic Systems, vol. 10(2), pp. 389–430, 2005. H. Eisenmann and F. M. Johannes, “Generic Global Placement and Floorplanning,” in Proc. ACM/IEEE Design Automation Conference, 1998, pp. 269–274. S. Fortune, “A Sweepline Algorithm for Voronoi Diagrams,” Algorithmica, vol. 2, pp. 153–174, 1987. H. Murata and K. Fujiyoshi and S. Nakatake and Y. Kajitani, “VLSI Module Placement Based on Rectangle-Packing by the Sequence Pair,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15(12), pp. 1518–1524, 1996. S. W. Hur and J. Lillis, “Relaxation and Clustering in a Local Search Framework: Application to Linear Placement,” in Proc. ACM/IEEE Design Automation Conference, 1999, pp. 360–366. ——, “Mongrel: Hybrid Techniques for Standard Cell
[14]
[15]
[16]
[17]
[18]
[19]
[20]
Placement,” in Proc. IEEE International Conference on Computer Aided Design, 2000, pp. 165–170. A. B. Kahng and S. Reda, “Evaluation of Placer Suboptimality Via Zero-Change Netlist Transformations,” in Proc. ACM/IEEE International Symposium on Physical Design, 2005, pp. 208–215. A. B. Kahng, S. Reda, and Q. Wang, “Architecture and Details of a High Quality, Large-Scale Analytical Placer,” in Proc. IEEE International Conference on Computer Aided Design, 2005, p. to appear. G.-J. Nam, C. Alpert, P. Villarrubia, B. Winter, and M. Yildiz, “The ISPD2005 Placement Contest and Benchmark Suite,” in Proc. ACM/IEEE International Symposium on Physical Design, 2005, pp. 216–219. J. Roy et al., “Capo: Robust and Scalable Open-Source Min-Cut Floorplacer,” in Proc. ACM/IEEE International Symposium on Physical Design, 2005, pp. 224–226. S. Sutanthavibul, E. Shragowitz, and J. Rosen, “An Analytical Approach to Floorplan Design and Optimization,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 10(6), pp. 761–769, 1991. T. Taghavi, X. Yang, B. K. Choi, M. Wang, and M. Sarrafzadeh, “DRAGON2005: Large-Scale Mixed-Size Placement Tool,” in Proc. ACM/IEEE International Symposium on Physical Design, 2001, pp. 245–247. N. Viswanathan and C. Chu, “FastPlace: Efficient Analytical Placement Using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model,” in Proc. ACM/IEEE International Symposium on Physical Design, 2004, pp. 26–33.