INVITED PAPER
Progress and Challenges in VLSI Placement Research Extensive research studies performed over the last 50 years addressed numerous aspects of global and detailed placement, a fundamental step in the physical design of integrated circuits. This paper surveys the history of placement research, the progress leading up to the state of the art, and outstanding challenges. By Igor L. Markov, Fellow IEEE , Jin Hu, and Myung-Chul Kim
ABSTRACT | Given the significance of placement in integrated circuit (IC) physical design, extensive research studies performed over the last 50 years addressed numerous aspects of global and detailed placement. The objectives and the constraints dominant in placement have been revised many times over, and continue to evolve. Additionally, the increasing scale of placement instances affects the algorithms of choice for high-performance tools. We survey the history of placement research, the progress leading up to the state of the art, and outstanding challenges. KEYWORDS
|
Circuit placement; large-scale optimization;
layout
I. INTRODUCTION Integrated circuits (ICs), whose invention by Jack Kilby and Robert Noyce was honored by the 2000 Nobel prize in Physics, have captured the minds of scientists, engineers, students, entrepreneurs, as well as users of industrial and consumer electronics. Buoyed by high demand, ICs grew in size over the last 50 years from several hundred components to over a billion components on a single semiconductor chip. Their increasing sophistication helped automate the design process [114], [206] and necessary bookkeeping, as well as perform efficient global optimization. Physical design is particularly involved [10], [95], due to its comManuscript received September 11, 2014; revised July 5, 2015 and August 25, 2015; accepted September 2, 2015. Date of publication October 9, 2015; date of current version October 26, 2015. This work was supported in part by the U.S. National Science Foundation under Award 1162087; by the Semiconductor Research Corporation under Project 2264; and by the Mentor Graphics Corporation. I. L. Markov is with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail:
[email protected]). J. Hu is with IBM Corporation, Hopewell Junction, NY 12533 USA. M.-C. Kim is with IBM Corporation, Austin, TX 78758 USA. Digital Object Identifier: 10.1109/JPROC.2015.2478963
putational complexity, multiobjective optimization, and sensitivity to advances in semiconductor manufacturing. Here we focus on circuit placementVthe challenge of finding good locations for individual circuit components. The importance of the topic can be seen from previous surveys [30], book chapters [10], [114], [206], and entire books on circuit placement [141], [173]. This survey covers developments in the field up to 2015 and discusses the necessary interactions with other aspects of modern physical design. Research on very large scale integration (VLSI) placement can be traced back to the 1960s, when the first netlist partitioning methods were developed in the industry, and subsequently motivated improvements in graph partitioning heuristics. Analytical placers1 started appearing in the early 1980s, but were eclipsed by combinatorial techniques when simulated annealing was invented. Annealing-based placers [183] dominated industry use and academic results for a decade, but by the mid-1990s, annealing was no longer scalable for newer and larger designs. Despite the steady improvement of analytical placement, partitioning-based methods improved enough to provide leading-edge performance: 1) (multilevel) Fiduccia–Mattheyses (FM) [63] heuristics produced much better results much faster than previous methods; 2) the use of end-case techniques (optimal partitioning and end-case placement) during top– down layout optimization provided high-quality detailed placement [18]; and 3) the use of flat and multilevel FM heuristics was carefully optimized, including cutline selection and hierarchical whitespace allocation [24]. By 2005, several analytical techniques have matured to the point where they reliably outperformed min-cut placement on contemporary large global placement instances. In addition to the innovations in algorithms, this was due to 1 Analytical placers model interconnect length by differentiable functions and use smooth optimization techniques.
0018-9219 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1985
Markov et al.: Progress and Challenges in VLSI Placement Research
the change in the nature of placement instances. In particular, having between 100 000 and 10 million movable objects during global placement provided a better justification to modeling each object as a dimensionless dot. Industry methodologies provided global placement with a large number of fixed objects [input/ouput (I/O) pads, fixed pins, macro blocks, etc.]. Due to concerns of routability, physical synthesis, power density, large size of I/O pads, large IP blocks, etc., core placement area now often includes a large amount of unused space [141] which provides analytical algorithms with useful freedom. In terms of algorithms, high-quality detailed placement was developed, resolving a long-standing handicap in analytical engines. These improvements fueled algorithmic developments based on multivariate calculus, numerical analysis, and combinatorial optimization [141]. Resulting reductions in interconnect length enhanced many types of semiconductor designsVfrom field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to central processing units (CPUs) and mixed-signal systems on a chip (SoCs). They have surpassed the length reduction typical for a new technology node. Despite such major progress, there is currently no agreement on which algorithms are considered best. Comparisons remain largely empirical [141] and are often affected by the quality and maturity of software implementations, use of parallel processing and high-performance libraries, as well as reporting methodologies. Fig. 1 depicts the evolution of academic global
placers and their correspondence to commercial placers. Experimental results suggest that academic placers are competitive to industry placers on very specific objectives, but less effective on multiobjective optimizations prevalent in commercial designs [94]. Recent advances in placement algorithms include: 1) high-performance wirelength-driven placement; 2) mixed-sized placement (i.e., simultaneous placement of both cells and macros); 3) routability-driven placement; 4) timing- and power-driven placement; and 5) the integration of global placement into physical synthesis.
I I. WIRELENGTH-DRI VEN PLACE MENT Modern placement is an optimization problem with many objectives and constraints. However, the most common approach is to first develop a wirelength-driven global placement engine that solves a straightforward mathematical formulation and is competitive on common benchmarks. This is a prerequisite for strong performance in multiobjective optimization, and the handling of additional objectives and constraints can be implemented as the next step. We cast global placement [95] as constrained optimization and explain how the objective is approximated by smooth differentiable functions to facilitate fast numerical methods. Common types of global placement algorithms are reviewed next, followed by legalization and detailed placement.
Fig. 1. Historical development of global placement algorithms and implementations in academia. Asterisks indicate placers that were developed or implemented in the industry (often by recent students)Vat IBM, Synopsys, and Mentor Graphics. Daggers indicate adoption of academic work and commercialization: TimberWolf was used by most EDA flows in the late 1980s, algorithms in PROUD were adapted in Cadence QPlace throughout the 1990s, KraftWerk algorithms were used by Intel and Magma, the Capo codebase was commercialized by Synplicity and Achronix, NTUPlace was commercialized by MediaTec, and mPL was commercialized for FPGA layout. Most academic placers presaged respective industry placers, and are described in greater detail in the literature. Part of APlace’s contribution was deconstructing and describing Naylor’s original patent [142].
1986
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
Markov et al.: Progress and Challenges in VLSI Placement Research
A. The Objectives and the Constraints The intuition for placement objectives is based on viewing a circuit as a graph where vertices model gates. The shorter the wires connecting the gates, the faster is the circuit. Therefore, we add up the lengths of edges and minimize the sum. Given that wire routes consist of only horizontal and vertical segments, lengths must be measured in the Manhattan planeVas the sum of x-span and y-span. When a single logic gate drives multiple logic gates, we generalize edges to hyperedges by replacing twoelement vertex sets with larger sets of vertices. The length contribution of a hyperedge is also calculated as the sum of its x-span and y-span. We now formalize this intuition. Global placement [95, Ch. 4] of a netlist N ¼ ðE; VÞ with nets E and n nodes (cells) V seeks a set of planar node locations ð~ x;~ yÞ 2 ½xmin ; xmax n ½ymin ; ymax n that minimize the weighted Half-Perimeter WireLength (wHPWL). Locations of individual circuit elements ðxi ; yi Þ are combined in two vectors ~ x ¼ ðx1 ; . . . ; xn Þ and ~ y ¼ ðy1 ; . . . ; yn Þ. While less intuitive, this notation helps dealing with the x and y components separately. Net weights are captured as vector ~ w ¼ ðwÞi , then wHPWLN ð~ x;~ yÞ ¼ wHPWLN ð~ xÞþ wHPWLN ð~ yÞ, where wHPWLN ð~ xÞ ¼ S e2E we ½max xi min xi : i2e
i2e
(1)
The HPWL function is continuous and convex, but not everywhere differentiable. It is amenable to combinatorial optimization, especially linear programming and network flows. However, these techniques do not combine well with other co-objectives, and are less scalable than smooth optimization applied to approximations of HPWL. Moreau’s theorem from convex analysis [177, Prop. IV.1.8] implies that the gradient of such a convex function is defined almost everywhere and can be approximated by the Yosida approximation of the function whose accuracy is controlled by the positive parameter ! 0. While this specific approximation has not been used in placement applications, a number of specialized approximations of HPWL have been proposed (Section II-B). Minimizing a convex objective is not difficult in itself. The main source of complexity in global placement is the nonconvex constraints. Despite the variety of constraints imposed in practice, two types are considered first. The fixed-position constraints enforce given locations of certain cells or macros. They are convex, can be substituted directly into the objective function, and do not require dedicated computations. The density constraints 1) ensure porosity, which is critical for routability and timingoptimization transformations; and 2) prevent movable objects from concentrating in small regions to ensure that legalization can find nonoverlapping positions for all objects with minimum disturbance from a global placement. When analytical techniques for global placement are used, fixed-
position constraints play the important role of spreading movable nodes between fixed positions as a side effect of interconnect optimization before density constraints are seriously considered. A second use, now commonly seen in quadratic placers, is to temporarily extend the netlist with fake nets (pseudonets) and fake cells (anchors) fixed at carefully chosen locations. Again, the purpose is to decrease peak density in the course of interconnect optimization.
B. Smooth Approximations of HPWL Modern placement algorithms (Section II-C) rely on approximations of the HPWL function, briefly surveyed below. Quadratic approximations used in many placers are obtained by pricing every edge by its squared length, with a certain weight. Their general form is Q ð~ x;~ yÞ ¼ ~ xT Qx~ x=2 þ ~ fx~ x þ~ yT Qy~ y=2 þ ~ fy~ y
(2)
with matrices Qx ; Qy derived from the netlist and vectors ~ fx ; ~ fy that reflect connections to fixed objects. When sufficiently many nodes in a connected netlist are fixed, Q is strictly convex. The x and y components can be optimized separately and quickly by solving sparse systems of linear equations Qx~ x ¼ ~ fx and Qy~ y ¼ ~ fy . To make such quadratic functions more appropriate for HPWL modeling, they are linearized [178] by adjusting the approximations at every global placement iteration. In particular, single-edge terms of the form wij ðxi xj Þ2 are changed to ðwij ðxi xj Þ2 Þ=ðjx0i x0j j þ "Þ where the primed values are constants based on the result of the last iteration. These adjustments modify nonzero elements of Qx and Qy , but not the sparsity pattern. To approximate HPWL by a quadratic objective, the netlist N is first transformed into graphs Gx and Gy that preserve the node set V and represent each two-pin net by a single edge with weight 1=length. Larger nets are decomposed depending on the relative placement of vertices using the Bound2Bound (B2B) model [62], [181]Vfor each p-pin net, the extreme nodes (min and max) are connected to each other and to each internal node by edges, with the following weight:
wB2B x;ij ¼
1
ðp 1Þ jxi xj j þ "
:
(3)
The B2B model is inherently local as it is instantiated for a particular set of x and y values and accurately represents HPWL only near those values. As x and y locations change, the decomposition is recalculated, and the sparsity patterns of matrices Qx and Qy change, while preserving the numbers of nonzeros.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1987
Markov et al.: Progress and Challenges in VLSI Placement Research
Nonquadratic approximations of HPWL, such as the log–sum–exp technique [170], approximates the bracketed term in (1) for ! 0 by the convex function ! X xk X x = k log e þ log e ! ½max xi min xi : (4) k2e
k2e
i2e
i2e
rium states correspond to spread out placements. Acknowledging that the density function ðx; yÞ may not be smooth, Kraftwerk [62], [181] looks for a twicedifferentiable potential function uðx; yÞ whose gradient ~ F ¼ ru represents a conservative force pointing away from dense regions.2 The latter condition is formalized in terms of flux over contour C that H an arbitraryR closed R bounds region R: C ð~ F nC Þd~ c¼ dx dy. Then, using R the divergence theorem and ~ F ¼ ru, we have
Other such techniques are surveyed and compared in [26], [77], and [117]. As with quadratic approximations, gradients are computed in closed form, but their numerical values must be updated as the placement changes. Numerical minimization requires more than solving two sparse linear systems.
C. Modern Global Placement Algorithms We now review main types of modern placement algorithms and refer to specific publications for additional details. The common classification that we follow is not disjoint, and some placement software combines multiple ideas. Quadratic placement is the foundation of BonnPlace [17], DPlace2.0 [131], mFAR [80], Kraftwerk [181], FastPlace [195], and RQL [195], as well as SimPL [103], [105], MAPLE [107], and ComPLx [106]. The netlist is modeled by a graph, for which the quadratic objective is formulated (Section II-B). The placer alternates between quadratic optimization and steps that involve density estimation (including fixed obstacles), in some cases along with dedicated spreading of movable objects. To decrease peak density, placers employ a variety of combinatorial and numerical techniques based on 1) network flows (BonnPlace); 2) self-contained estimation of density gradients (FastPlace and RQL); and 3) ‘‘full spreading’’ (SimPL) and derived algorithms. In all cases, the objective in a previously solved quadratic program is modified, and the program is resolved. Force-directed placement models interconnects by coil springs as described by Hooke’s law, and formulates equations for force equilibrium, and solves these equations using quadratic programming. We illustrate force-directed placement Kraftwerk [181], where three forces are involved in force-equilibrium equationVthe spring force, the hold force, and the density-based (spreading) force. The spring force corresponds to a quadratic approximation of the HPWL objective (Section II-B), the hold force is its ‘‘opposite action’’ in the spirit of Newton’s Third Law of motion, and the density force seeks to even out cell density. Let ðx; yÞ denote the cell density at ðx; yÞ. The density force is then, loosely speaking, weight-averaged r [see (6)]. The hold and density forces acting on a cell are cumulatively represented by a pseudonet connected to a carefully placed fake fixed cell (anchor). Density-based force computations were pioneered in the context of force-directed placement [62], [181] as a specific choice for the additional forces such that equilib1988
I
ð~ F nC Þd~ c¼
ZZ
Du dx dy ¼
R
C
ZZ dx dy:
(5)
R
This condition is satisfied by solutions of Poisson’s equation Du ¼ , which can be interpreted as finding the electrostatic potential u for the curl-free field ru generated by a spatial charge distribution (positive charges represent cells and negative charges represent unused cell cites) A word of caution: Poisson’s equation only gives a way to satisfy properties postulated for ~ F. The developers of mPL6 [27] noted that Poisson’s equation can be ill-defined and added a new term, producing the nonhomogenous Helmholtz (screened Poisson’s) equation Du "u ¼ , both of which are linear second-order elliptic partial differential equations (PDEs) [144, Figs. 2 and 3]. This smoothing effect is formalized in [53] for both Poisson and Helmholtz PDEs by representing solutions as convolutions (over the entire placement region R) of with certain Green’s functions Gðx; sÞ dependent on the boundary conditions
uðÞ ¼
Z Gð; ÞðÞd
(6)
R
where and represent ðx; yÞ pairs in R. This observation leads to a particularly efficient computation of ru. For example, for Poisson’s equation 1) without boundary conditions, and 2) with zero gradients at infinity
1) 2)
1 k k ln k k : Gð; Þ ¼ 2 Gð; Þ ¼
(7)
In a recent development, ePlace [125] presents a different way to cast density leveling as an electrostatic equilibrium problem. This approach yields closed-form gradients for global density reduction and theoretically 2 A conservative force ~ F satisfies the following equivalent conditions: H F 1) 9u : ~ F ¼ ru (a potential function exists); 2) 8 closed contour C: C ~ d~ c ¼ 0 (path independence); and 3) r ~ F ¼ 0 (curl-free force).
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
Markov et al.: Progress and Challenges in VLSI Placement Research
guarantees progress toward configurations with lower potential energy. Alternatively, density leveling can be modeled by the discrete transportation problem and solved by reduction to network flows [16]. Approaches in this category approximate the nonconvex global-placement problem by a sequence of convex quadratic problems, often using additional forces (or gradients) to construct intermediate problems. Nonconvex optimization placers such as APlace [99], mPL6 [27], NTUPlace3 [39], NTUPlace4 [78], and ePlace [125] typically approximate HPWL by nonquadratic objective functions for which gradients (and Hessians) can be computed analytically (Section II-B). The contribution of fixed obstacles (e.g., macros) is modeled by 2-D step functions, and then approximated by bell-shaped [39], [99] functions, but often postprocessed by Gaussian smoothing and leveling [39]. The combined optimization function is nonconvex, in stark contrast to quadratic and forcedirected placement. Additional details can be found in [30] and the survey in [114]. Nonlinear optimization employed by these algorithms is time consuming. It is often accelerated using combinatorial netlist clustering. Using netlist clustering in placement typically improves the runtime of global optimization at the cost of a small loss in solution quality. However, in some cases, it also improves quality of result [107], [215]. Global placement uses clustering in a similar fashion to multilevel partitioningVthe netlist is first coarsened by clustering its components. After optimization is performed quickly on the coarse netlist, the netlist is refined by dissolving some clusters into smaller clusters. Then, optimization is applied incrementally, so as to refine the locations of smaller clusters (which are initially given the location of the containing cluster). Such refinement iterations can be repeated, although some placers only perform it once. An algebraic scheme for refinement is described in [34]. Commonly used clustering algorithms include First-Choice [100] (Capo, NTUplace) and Best-Choice [6] (APlace, mPL6, FastPlace3, RQL, MAPLE). These algorithms proceed bottom–up and sequentially select small groups (or just pairs) of nodes/clusters to merge. The Best-Choice algorithm sets clustering scores between pair of objects using a function that decreases with their areas, and stores them in a lazily updated priority queue. The SafeChoice algorithm [215] is more conservative and time consumingVit seeks groups of objects which can be merged without worsening placement quality. When used with several different global placement algorithms, SafeChoice improved results by explicitly considering current cell locations and wirelength gradients during clustering. MAPLE [107] pays particular attention to incremental placement optimization after cluster refinement, which helps recover solution quality and even improve it. Such optimizations blur the boundaries between global and detailed placement. Next, we discuss locality, force orientation, and force modulation. Density gradients in APlace [99] and NTU-
Fig. 2. High-level flow of SimPL [103]. After initial wirelength-driven placement, SimPL performs rounds of upper and lower bound global placement iterations. The bottom graph depicts the progression of the placement solution with respect to wirelength.
Place [39] are based on local information and, e.g., do not account for possible paths around fixed obstacles. This may require numerous global placement iterations. Kraftwerk and mPL6 find density gradients by solving linear elliptic PDEs as explained above, incorporating more global perspective into their density gradients. However, it remains unclear how well this accounts for possible paths around fixed obstacles. Whether gradients are calculated based on local information or identify the globally best direction, the scaling of density gradients for proper balance with interconnect-based forces often remains unclear. To this end, the work in [195] distinguishes the tasks of force orientation and force modulation. It demonstrates that force modulation in FastPlace can be improved by reducing the magnitude of 10% strongest forces, as implemented in RQL. These challenges are addressed in a more systematic way in SimPL [103], [105] by lookahead legalization (LAL), which globally identifies target locations for all cells so that most overlaps are removed (Figs. 2 and 3).
D. Legalization and Detailed Placement The cell locations from global placement may overlap, and typically do not align with power rails. The global placement must then be legalized, where all cell overlap is removed without undermining design objectives. Legalization removes all cell overlap while minimizing total cell displacement, and is necessary not only after global placement, but also after incremental changes, e.g., physical synthesis optimizations [7]. Unlike cell spreading during global placement, legalization is typically performed when cells are both 1) well distributed over the entire region; and 2) have relatively small overlap. A legalized placement can be improved with respect to a given objective by detailed placement, e.g., swapping neighboring cells to reduce total wirelength, or sliding cells to one side of the row when
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1989
Markov et al.: Progress and Challenges in VLSI Placement Research
Fig. 3. Illustration of SimPL [103] during global placement. The left-hand column depicts the lower bound placements attained at individual iterations, while the right-hand column depicts the upper bound placements. Each upper bound iterate is derived from the lower bound iterate through lookahead legalization and is used to define forces that enhance the legality of lower bound iterates with minimal increase in wirelength.
Detailed placers preserve legality and often other design constraints, such as routing congestion or placement density, while improving solutions by relocating movable cells [108]. Branch-and-bound placers [24] reorder groups of neighboring cells in a row by a sliding-window technique, where cells are reordered optimally inside each window. However, this approach can handle typically up to eight cells at a time. A more scalable optimization, handling up to 20 cells at a time, splits the cells in a given window into left and right halves, and optimally interleaves the two groups while preserving the relative order of cells from each group [85]. FastPlace-DP [149] improves wirelength by swapping pairs of nonadjacent cells, and by cycling triplets. When unused space is available between cells in a row, these cells can be shifted to either side or to intermediate locations. Wirelength-optimal locations in a given row can be found by a polynomial-time algorithm [98], which is practical in many applications. ECO-System [166], integrated in Capo [24], [168], identifies areas where cells need to be replaced, then applies Capo to perform both legalization and detailed placement, simultaneously and consistently in all such regions. Modern circuit layout is experiencing an increasing number of design constraints and objectives, as well as design rules. Therefore, wirelength-optimized global placement solutions must undergo numerous local transformations to generate layouts that optimize timing, routability, and manufacturing yield (Sections VI and VIII). Such transformations may adversely affect wirelength and routing demand, which is why fast and effective wirelength recovery is invoked periodically in practical physical-design flows. While different from the traditional detailed placement setting, this context leverages algorithmic techniques developed for detailed placement. This was illustrated in the 2013 International Conference on Computer-Aided Design (ICCAD2013) detailed-placement contest [108] TABLE 1 Legalization and Detailed Placement
unused space is available. Table 1 summarizes methods for legalization and detailed placement. Legalization algorithms can be classified as 1) local approaches, where cells are moved one at a time or in small groups; and 2) global strategies that relocate large groups of cells. Some local legalizers, such as Tetris [73], greedily assign each cell to its nearest legal location while respecting row capacity. Abacus [180] also finds the best row the cell belongs to, but uses dynamic programming to replace the already placed cells such that the total displacement is minimized. Ho and Liu [74] and Lee et al. [116] also explicitly minimize global wirelength during legalization. Global strategies can be illustrated by the use of network flows to find optimal solutions under mild assumptions. Extensions include incorporating history [44] and modifying path augmentation algorithms [12]. Additional techniques are shown in Table 1. 1990
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
Markov et al.: Progress and Challenges in VLSI Placement Research
TABLE 2 Mixed-Size Placement Techniques
that focused on fast wirelength recovery subject to celldensity constraints and maximum displacement limits. The top three teams achieved very similar results, despite using different algorithms. This may indicate that known algorithms exhaust wirelength-driven detailed placement and the extended formulation above. Two algorithms used by top contestants, BraveDP and RippleDP, are described in [47] and [156], respectively. BraveDP performs cell swapping with lazy profit updates to reduce HPWL and density peaks without violating displacement limits. RippleDP first moves cells to their optimal HPWL regions subject to displacement limits, and then locally improved locations by inter-row moves, cell reordering, and compaction. Challenges for legalization and detailed placement include incorporating additional geometric and performance constraints, as well as various design objectives. For example, legalization can simultaneously minimize wirelength [74], [116] and improve routability [78]. Practical detailed placers must incorporate timing optimization [196], detailed routability [101], [199], as well as manufacturability and yield optimization [112], [220]. Power minimization, temperature gradient constraints, and new design rules for sub-16-nm technology nodes pose additional challenges for ASIC detailed placement. In contrast, state-of-the-art FPGAs emphasize compliance with highly structured programmable fabrics, including carry chains, embedded arithmetic blocks, and memories. Circuit timing for FPGAs is modeled nonconvex discrete functions that reflect hierarchical ‘‘express wires’’ available at fixed locations throughout the chip.
II I. MIXED-SIZE PLACEMENT The number of macros included in modern ICs is growing [207] in response to technology and business trends. Finding locations of larger circuit modules and placing standard cells are essentially the same from an optimization viewpoint, distinguished only by 1) the scale relative to the size of layout regions; and 2) the shaping and rotations of macros in floorplanning. Mixed-sized placement carefully
combines floorplanning techniques, to pack (the relatively few) large blocks, and placement techniques, to handle the millions of small standard cells. Several available approaches are summarized in Table 2 and illustrated in Fig. 4.
A. Simultaneous Flows Simultaneous flows do not separate the placement of standard cells and macros into separate stages. One method to handle macros is to divide them into ‘‘shreds’’ comparable in size to standard cells [2]. These shreds can be connected by fake nets during wirelength optimization so as to keep them close together [2], [17], e.g., when shaping soft blocks [164]. In contrast, Kim and Markov [106] only shred macros during geometric spreading. Other placement-based approaches explicitly shift macros or cells during placement [39], [107], [197], or relegalize after every placement iteration [27], [56]. Techniques that simultaneously move macros and standard cells can be classified into force directed [62], [197], nonconvex [27], [39], [76], min-cut [164], and flow based [45]. However, many ideas are applicable in different contexts. One strategy is to legalize and fix macros that are comparable in size to the magnitude of cell displacements at the current iteration [27], [167]. B. Sequential Flows Sequential flows separate macro and standard-cell placement. Some flows place all macros at once after tentatively placing the full netlist, and before standard-cell placement, such as packing macros at the chip periphery of [38]. Other approaches [2], [216] 1) cluster standard cells into soft blocks; 2) use a floorplanner on the original macros and new soft blocks; 3) dissolve soft blocks; and 4) place standard cells using established methods. Several techniques [35], [38], [216] account for macro flipping and rotation. C. Postprocessing Placement Methods Postprocessing placement methods remove overlap between macros and standard cells by floorplan repair
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1991
Markov et al.: Progress and Challenges in VLSI Placement Research
[139], detailed placement [17], [55], [181], and forcedirected techniques [62], [164].
I V. ROUTABILITY-DRIVEN PLACEMENT With increasing design complexity, optimizing traditional placement metrics is insufficient for successful routing [8], [165]. To mitigate routing failures, routability-driven placers incorporate route estimation as part of their flow.
A. Congestion Maps Congestion maps indicate regions where routing will be difficult, and are used to guide optimization during placement. They are generated using: 1) static approaches, where the congestion map is fixed for a placement instance; 2) probabilistic approaches, where net topologies are not fixed, and probabilistically determined; and 3) constructive approaches, where a simplified global router generates approximate net routes (Fig. 5). Traditionally, the first two options have been the most popular, but the last option has been gaining traction due to advancements in global routers, where they are designed to handle greater layout complexity. Table 3 summarizes these approaches.
Fig. 4. Illustration of different techniques for simultaneously placing large macro blocks (red outlines) and standard cells (blue). The top image illustrates multilevel placement to spread standard cells around macro blocks. The bottom image demonstrates the placement of large macro blocks by representing each one as many smaller standard-cell-sized blocks and maintaining tight bonds between the smaller blocks.
B. Placement Optimizations Placement optimizations are applied throughout the entire placement flow: 1) during global placement; 2) modifying intermediate solutions; 3) during legalization and detailed placement; and 4) as a postplacement processing step (Table 4). In global placers, the most popular techniques are cell bloating and whitespace injection. Depending on the placer type, e.g., quadratic and min-cut, the implementation of these techniques will require placer modification, including changing the optimization function. In detailed
Fig. 5. Illustration of congestion alleviation during global placement using SimPLR [104]. In the left panel, the cells (depicted in blue) are tightly packed (right) with high areas of congestion shown in red and purple (left). In the middle panel, packing peanuts (right, depicted in red) encourage cells to spread, thereby decreasing local cell density and reduce congestion (left). In the right panel, packing peanuts have increased in size (left), and congestion has been substantially reduced (left).
1992
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
Markov et al.: Progress and Challenges in VLSI Placement Research
TABLE 3 Congestion Estimation for Placement
placers, the most popular techniques are cell swapping and cell shifting. Additional optimizations can be applied to intermediate (or near-final) placement solutions, and then passed on to the next step of the design flow.
C. Contests Researchers from IBM organized the 2011 International Symposium on Physical Design (ISPD 2011) Routabilitydriven Contest [192]. The benchmarks included between 483 000 and 1.29 million movable cells and a set of routing constraints (e.g., blockages), such that many of them were intentionally difficult to route. Placement solutions were evaluated by running a global router and counting violations. The results indicate that routability can be improved by increasing cell porosity. SimPLR [104] and Ripple [71] used congestion maps to temporarily bloat cells, modulate target aspect ratio of cell placement areas, and modify the TABLE 4 Routability-Driven Placement
anchor positions during quadratic placement. NTUPlace4 [78] uses smoothened congestion maps when modeling pin density. To estimate congestion, SimPLR integrated a global router [83], whereas Ripple and NTUPlace4 adopted probabilistic congestion estimation [179]. The 2012 Design Automation Conference (DAC 2012) [193] and ICCAD 2012 Contest Benchmarks [194] were easier to route and emphasized the reduction of peak congestion. The evaluation metric combined runtime and scaled HPWL. Since then, further improvements [72], [84], [127] have followed.
V. TIMING- AND POWER-DRIVEN PL ACEMENT Timing-driven placement (TDP) seeks to optimize circuit delay. TDP identifies critical nets using static timing analysis (STA) [95, Sec. 8.2], and typically minimizes total negative slack (TNS), worst negative slack (WNS), or both. Table 5 outlines TDP; Fig. 6 illustrates the impact of net weights on selected critical nets.
A. Net-Based Approaches Net-based approaches optimize circuit delay by translating STA results and timing constraints into net weights TABLE 5 Timing-Driven Placement Approaches
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1993
Markov et al.: Progress and Challenges in VLSI Placement Research
Fig. 6. Illustration of modifying net weights to manipulate timing and wirelength. In the left panel, the highlighted net has no weighting, and is given a high degree of freedom. As a higher net weight is applied to this net, the placer will restrict the net’s movement and reduce its wirelength, as shown in subsequent panels. The higher the net weight, the more priority this net is given to reduce wirelength. However, other (less critical) nets’ wirelength will be implicitly increased, as they may be moved to cater toward more critical nets.
and net constraints, respectively. A higher net weight encourages interconnect optimization to preferentially shorten the net, whereas a net constraint limits net delay. Static net weights remain constant during placement, and are typically based on negative slack [20], [28], [61], [111] or sensitivity [66], [163], [213], where placers attempt to predict the impact each net has on timing. A net weight that is too high may shorten a net at the expense of upstream or downstream nets, increasing circuit delay. To avoid this, dynamic net weights are gradually updated based on slack change [20], [160] or net criticality [62], [160]. While more flexible, dynamic net weights may cause nets to oscillate between critical and noncritical [54]. To dampen oscillations, net weights are accumulated based on histories, and increased monotonically. Net constraints limit net length, which linearly correlates with the delays of buffered long wires [145], and do not require as accurate timing predictions as net weights. Common methods to generate delay budgets include the zero-slack algorithm (ZSA) [95, Sec. 8.2.2], [126], [140] and the iterative-minmax-PERT algorithm [219]. The usage of net constraints is placer dependent. Min-cut placers [64] modify cut costs, and analytic placers modify forces [159] or Lagrange multipliers [188]. In detailed placement, net constraints have also been integrated in cell movement [86], primary objective functions [68], and as a separate local-move step [45]. Net constraints are supported by differential timing analysis [162], which generalizes incremental STA [151].
mathematical program that maintains intermediate timing variables. Auxiliary techniques, e.g., partitioning [89] and Lagrangian relaxation [69], [182], can solve the program and improve quality. Other approaches solve linear programs in local regions [48] and use simulated annealing [183].
B. Path-Based Approaches Path-based approaches directly model timing on individual critical paths, and explicitly ensure that each considered path meets timing constraints. These approaches typically achieve better solutions than net-based approaches because of their global scope, but remain practical for only a limited number of paths. To deal with numerous critical paths in large designs, path-based approaches 1) embed a graph-based timing model; and 2) formulate a
E. Static-Power Reduction Static-power reduction techniques trade positive timing slack for power. When multiple supply voltages are present, cells can be moved closer to voltage sources in rows [218], where cells are in interleaving (half) rows of high and low VDD rows, and in regions [82], [113], [157], where cells are powered by the closest voltage island. Other approaches leverage cell hierarchy using clustering [124], and exploit the locality of connections [158].
1994
C. Compound Approaches Compound approaches seek to combine the scalability of net-based approaches and accuracy of path-based approaches. They can be illustrated by [130], which uses a hybrid path-based delay sensitivity function for net weights and minimizes critical nets’ wirelength. D. IC Power Optimization IC power optimization distinguishes dynamic power from static power, which does not directly depend on cell locations. With multiple voltages, static power can be reduced by changing voltage-island assignments. In contrast, dynamic power depends on interconnect lengths, which are determined by placement. To support higher clock frequencies, modern designs are heavily pipelined, and require more capacitive clock networks, which can contribute 30% of total IC power [134]. To support design scaling [88], placers must co-optimize 1) the hundreds of millions of signal nets, each consuming a small amount of power; and 2) clock networks with significant power consumption, as illustrated in [42] and [115]. Table 6 outlines powerdriven placement; Fig. 7 illustrates placement optimization within regions.
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
Markov et al.: Progress and Challenges in VLSI Placement Research
TABLE 6 Power-Driven Placement Techniques
F. Dynamic-Power Reduction Dynamic-power reduction can be accomplished by 1) reducing net activity; and 2) optimizing register locations and optimizing the clock tree. These classes are not mutually exclusive. Clustering registers at the clock-tree leaves facilitates inverter sharing [32], [92] and reduces clock-tree capacitance [42], [115]. Such optimizations can be performed using net weights based on activity factors [42], [143], [171], [190]. Register placement is described in [43], [115], [133], and [153], whereas clock-tree synthesis is incorporated into global placement in [50], [115], [153], [176], and [204]. In [115], this integration reduces clock trees by 30% and total dynamic power of the netlist and clock tree by 7%. Shen et al. [176] add clockgating logic, and further refine the tree with incremental placement techniques. An IBM physical-synthesis flow that accounts for gated clocks in high-performance ICs is described in [153].
VI . PLACEMENT IN PHYSICAL SYNTHES IS Physical synthesis [7] modifies the netlist based on placement information so as to 1) fix timing violations; and 2) optimize performance metrics. After a round of optimizations, the design can be replaced to facilitate further optimizations. Hence, the interaction between placement algorithms and physical synthesis is crucial for timing closure. Table 7 lists physical synthesis techniques that heavily interact with placement.
A. Logic Transformations Logic transformations [95, Sec. 8.5.3] such as cloning, gate decomposition, and combinational restructuring manipulate area-power-timing tradeoffs in combinational circuits. Newly inserted gates must be given valid locations, which can make or break a given transformation [60], as illustrated by 1) restructuring fanin trees [212] and cones [208]; and 2) simulation-driven restructuring that uses controllability and observability don’t-cares [155]. Fig. 7. Region-based optimization during placement. At the top, marked cells are not constrained. At the bottom, the cells’ locations are constrained to a small rectangle to shorten particular nets and reduce delay and dynamic power.
B. Interconnect Buffering Interconnect buffering [121], [191] improves circuit timing by speeding up signal transitions in long nets.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1995
Markov et al.: Progress and Challenges in VLSI Placement Research
TABLE 7 Placement-Aware Physical Synthesis
Approximately equal-spaced buffers break down timingcritical nets into shorter segments. Due to technology scaling, buffers comprise 10%–44% of standard-cell instances in large designs [152]. As the final buffer requirement is unknown in advance, placers reserve unused space (whitespace) throughout the layout [3]. Virtual buffering [152], [174] assigns buffers to long interconnects without changing the netlist, but accounts for their impact on area and timing. Porosity-aware buffer planning integrated in an analytical-placement framework adds buffer density to the objective function [36]. Porosity-aware Steiner trees place buffers in available sites [5].
C. Gate Sizing Gate sizing [81], [147] does not change connectivity but impacts timing-power tradeoffs, facilitating other optimizations. Papa et al. [151] develop placement-aware branch-andbound search using a discrete cell library. Alternatively, gate sizes can be extrapolated using continuous delay models within performance-driven physical design [184], [198]. D. Physical Retiming Physical retiming moves registers through combinational logic in the netlist in order to ease timing constraints. Despite numerous publications on retiming in the logic domain, practical implementations must account for interconnect delay, and therefore gate locations and buffering. A retiming-based physical-transformation system [150] uses virtual buffering and exploits the interaction between retiming, placement, and cloning in a unified mixed integer-linear program. E. Compound Optimizations Compound optimizations perform sophisticated areatiming-power tradeoffs along multiple design dimensions while limiting design iterations and turnaround time. Logic transformations, buffer insertions, and gate sizing may require legalization and detailed placement. Additionally, Li et al. [118] adjust floorplans to accommodate area changes. Industrial physical-synthesis flows are reported in [151], [153], [186], and [198]. 1996
VII. BENCHMARKING INFRASTRUCTURE AND COM PARISONS OF PLACEMENT ALGORITHMS Benchmarking activities for research on placement algorithms have a long tradition [1]. Key elements of benchmarking infrastructure have been developed by the placement research community over the last ten years: • rigorous geometry and netlist descriptions; • closed-form objective functions; • community-shared file formats, based on the Bookshelf format originally designed for the Capo placer [23]; • a variety of circuit designs contributed by industry (IBM Research and Mentor Graphics) for regular research contests held at ISPD and ICCAD, available on respective conference websites. This infrastructure made possible to measure and report even fairly small improvements to placement algorithms, which appear consistently over multiple benchmarks. Moreover, such improvements typically survive in the industry environment and gradually add up. In our experience, some algorithmic changes exhibited greater improvements on proprietary designs than on public benchmarks.
A. Research Contests The availability of a common data formats and industry benchmarks enabled research contests, which typically give graduate students several months to adapt their algorithms to a new set of industry netlists, constraints, and objectives, and develop high-performance optimizers. Such contests were held regularly in the placement community since ISPD 2005, and dozens groups built up extensive software libraries. In the last ten years, these contests have spearheaded and validated significant advances in placement research. Starting from the pure-wirelength-driven formulations in 2005, the contests broadened their interests toward cell density [141], global routability [192]–[194], detailed routability [21], [221], and timing [109]. The ISPD 2005 and 2006 contests on newly released large netlists have confirmed that analytic global placers have matured to the point where previously leading min-cut placers became uncompetitive [141]. Subsequent contests were often dominated by quadratic placers, which were faster than general nonlinear optimization placers that produced better solutions. Incorporating routability and runtime into the objectives identified algorithmic frameworks that could be best adapted to practical design flows. Contest organizers were EDA professionals involved in placement R&D in the industry, and this ensured prompt recognition of contest results, employment opportunities for best performing graduate students, and even commercialization of best ideas. B. Comparisons Between Academic and Industry Placement Tools Historically, academic placement tools have been closely related to their industrial counterparts. Many
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
Markov et al.: Progress and Challenges in VLSI Placement Research
academic placers presaged industry placers, whereas Nalyor’s patent [142] spearheaded nonquadratic academic placer development (Fig. 1). Even so, direct comparisons between academic and industry placers have been restricted by terms of specific license agreements, as well as mismatches in data models and details of the larger chip context. Recent research in [94] created a framework and specific benchmarks that enable comprehensive ‘‘apples-toapples’’ assessments across commercial and academic tools including placers, routers, and gate sizers. The results suggest that state-of-the-art academic tools can outperform industry tools on individual well-studied objectives, such as wirelength (measured after placement) and overflow (measured after routing), but are more likely to underperform in highly constrained multiobjective optimization. This is to be expected, since comprehensive, customerfocused software development is a lengthy and capitalintensive process that does not reflect the goals of academic research. Instead, academic tools focus on some aspects of the placement process and often pursue the objectives proposed in academic contests.
VIII . OPEN CHALLENGES As modern ICs continue to grow [88], flat placement will require new algorithms and data structures to support physical design and physical synthesis with numerous macros and multiple clock domains [9]. Most gate-level techniques for 3-D placement remain impractical due to high TSV costs [102].
A. Automatic Generation of Datapath Layout Automatic generation of datapath layout currently remains inferior to manual placement, but significant improvements have been made in the last few years [46], [205]. A key issue is whether general-purpose placement approaches can be extended to handle datapath layouts as successfully as unstructured layouts, or specialization is required. Recent studies [205] suggest the latter, pointing
out that conventional optimization objectives optimized by conventional placers have serious pitfalls in the case of datapaths (Fig. 8). In particular, datapaths require more accurate modeling of multipin nets and their routing. To facilitate specialized datapath placement techniques, it is important to identify datapath subcircuits and, after placement is performed, integrate the results into a larger layout. Specialized placement can also be viewed in terms of componentsValignment of bit slices, careful spacing of aligned groups, as well as structure-aware legalization algorithms [46], [205] (Fig. 8). The least studied aspect of such a design flow is the integration into a larger layout.
B. Layout-Friendly High-Level Synthesis Layout-friendly high-level synthesis [52] promises improvement in IC power and performance by invoking physical optimization much earlier in the design flow. Such operations require reasonably accurate modeling of key design parameters, fast layout estimation, and powerful high-level design transformations that would generate multiple configurations to choose from. Large datapaths offer significant freedom for such transformations and algebraic structure that facilitates high-level transformations. Moreover, datapaths arise as bottlenecks for power and performance in mass-market IC designs and science-support projects. Another key domain for layout-friendly high-level synthesis is in tradeoffs between computation and communication. To this end, layout information is necessary to determine 1) spatial resources available for compute modules; and 2) the cost of communication. Specialized bus routing is necessary to determine appropriate serialization/ deserialization tradeoffs. C. Integrated Timing and Power Optimizations As timing-critical nets are typically identified by signoff quality timing engines after placement, significant placement modification may be required in the presence of a large number of near-timing critical nets. Removing all slack violations on these critical nets during placement,
Fig. 8. Illustration of datapath optimization during global placement. The red or light lines (black or dark lines) show horizontal (vertical) alignments. Techniques such as weighting (center) and fixed-point alignment (right) significantly improve datapath alignment.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1997
Markov et al.: Progress and Challenges in VLSI Placement Research
however, can undermine placement quality, generate new critical nets, and hamper timing closure. Moreover, as the distance between metal layers (and neighboring nets) is reduced, coupling delay further complicates timing analysis, as it can now be induced by nets above and below in addition to the parallel wires on the same layer. Further challenges arise when integrating placement with clocknetwork synthesis to address process variation, useful skew, hold constraints, clock gating, and other low-power optimizations. In particular, short-path (hold) constraints pose surprisingly difficult challenges in commercial IC designs, and inserting buffers to mitigate these problems results in an expensive overhead for power and area. Clock gating optimizations can yield major savings in power, but also increase and complicate the connectivity of the combined netlist containing signal and clock nets. A typical netlist transformation in this context moves the gaters further away from the clock source and clones them to handle tree fanout.
E. Quantifying the Impact Given the significant investment in placement algorithms and tools, as well as the tangible progress achieved, it is important to quantify their impact on the cost and quality of ICs and semiconductor products. Experiments in [169] show that more effective congestion-driven placement facilitates die-size reductions, which translate into lower manufacturing costs and higher profits. A variety of cost functions and large-scale design-optimization effects must be evaluated conclusively. Due to the significant effort involved and the need for careful analysis of results, EDA tool vendors prepare circuit designs that demonstrate key layout issues, and then release these benchmarks to universities as part of student contests. Top performers then describe their techniques in scholarly publications. This open-research environment has been instrumental in advancing the state of the art in physical layout, with many contests targeting global placement.
D. Lithography-Aware Physical Synthesis Lithography-aware physical synthesis [146], [211] enables early manufacturability optimization by, e.g., controlling wire density to enhance chemical–mechanical polishing (CMP) [37] and improve timing yield [97]. Such optimizations are numerous, and their significance depends on the technology node. In particular, starting with 90–45-nm nodes global routers have to prevent excessive wire density to minimize crosstalk and avoid pitfalls in CMP. Since global placement holds significant influence over global routing (and the two have been integrated), wire density must be considered in global placement for large ICs. Starting at 45 nm, forbidden pitches and other lithography constraints required additional support. Manufacturing yield optimization created a new situation where many single violations of soft rules can be tolerated, but their numbers must be kept low. More recently, double- and triple-patterning required even more involved support during place-and-route and standard-cell synthesis.
I X. CONCLUSION
REFERENCES [1] S. N. Adya et al., ‘‘Benchmarking for large-scale placement and beyond,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 23, no. 4, pp. 472–487, Apr. 2004. [2] S. N. Adya and I. L. Markov, ‘‘Combinatorial techniques for mixed-size placement,’’ ACM Trans. Design Autom. Electron. Syst., vol. 10, no. 1, pp. 58–90, 2005. [3] S. N. Adya, I. L. Markov, and P. G. Villarrubia, ‘‘On whitespace and stability in physical synthesis,’’ Integration, vol. 39, no. 4, pp. 340–362, 2006. [4] A. Agnihorti et al., ‘‘Fractional cut: Improved recursive bisection placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2003, pp. 307–310. [5] C. J. Alpert, G. Gandham, M. Hrkic, J. Hu, and S. T. Quay, ‘‘Porosity aware buffered
1998
[6]
[7]
[8]
[9]
[10]
Research in VLSI physical design advanced quickly in the last 30 years. The first ten years (and prior research) identified well-defined optimization stages such as placement, routing, and clock-tree synthesis, with established algorithms and intuitions. The next ten years of research discovered algorithms with increasing sophistication, better scalability, and competitive quality of results. The last ten years saw an explosion of benchmarking activities and software-based contests, which revealed significant room for improvement over prior art. Additionally, isolated physical-design steps were extended to multiobjective optimizations and coordinated with other aspects of IC design, to serve the needs of design flows focused on timing closure and manufacturing concerns. Research on placement has been at the forefront of these developments. It contributed profoundly to the intellectual underpinnings of physical design and its performance improvements in practice. Current commercial software tools for IC design draw heavily upon academic research, and further improvements are likely. h
Steiner tree construction,’’ in Proc. Int. Symp. Phys. Design, 2003, pp. 158–165. C. J. Alpert, A. Kahng, G.-J. Nam, S. Reda, and P. Villarrubia, ‘‘A semi-persistent clustering technique for VLSI circuit placement,’’ in Proc. Int. Symp. Phys. Design, 2005, pp. 200–207. C. J. Alpert et al., ‘‘Techniques for fast physical synthesis,’’ in Proc. IEEE, vol. 95, no. 3, pp. 573–599, Mar. 2007. C. J. Alpert et al., ‘‘What makes a design difficult to route,’’ in Proc. Int. Symp. Phys. Design, 2010, pp. 7–12. C. J. Alpert et al., ‘‘Placement: Hot or not?’’ in Proc. Int. Conf. Comput.-Aided Design, 2012, pp. 283–290. C. J. Alpert, D. P. Mehta, and S. S. Sapatnekar, Eds., Handbook of Algorithms for VLSI Physical Design Automation. Boca Raton, FL, USA: CRC Press, 2008.
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
[11] V. Betz and J. Rose, ‘‘VPR: A new packing, placement and routing tool for FPGA research,’’ in Field-Programmable Logic and Applications, ser. Lecture Notes in Computer Science, vol. 1304. Berlin, Germany: Springer-Verlag, 1997, pp. 213–222. [12] U. Brenner, ‘‘VLSI legalization with minimum perturbation by iterative augmentation,’’ in Proc. Design Autom. Test Eur. Conf. Exhibit., 2012, pp. 1385–1390. [13] U. Brenner and A. Rohe, ‘‘An effective congestion driven placement framework,’’ in Proc. Int. Symp. Phys. Design, 2002, pp. 6–11. [14] U. Brenner and J. Vygen, ‘‘Faster optimal single-row placement with fixed ordering,’’ in Proc. Design Autom. Test Eur. Conf. Exhibit., 2000, pp. 117–121. [15] U. Brenner and J. Vygen, ‘‘Legalizing a placement with minimum total movement,’’
Markov et al.: Progress and Challenges in VLSI Placement Research
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
IEEE Trans. Comput.-Aided Design, vol. 23, no. 12, pp. 1597–1613, Dec. 2004. U. Brenner and M. Struzyna, ‘‘Faster and better global placement by a new transportation algorithm,’’ in Proc. Design Autom. Conf., 2005, pp. 591–596. U. Brenner, M. Struzyna, and J. Vygen, ‘‘BonnPlace: Placement of leading-edge chips by advanced combinatorial algorithms,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 9, pp. 1607–1620, Sep. 2008. M. Breuer, ‘‘Min-cut placement,’’ J. Design Autom. Fault-Tolerant Comput., vol. 10, pp. 343–382, 1977. R. E. Burkard, S. E. Karisch, and F. Rendl, ‘‘QAPLIBVA quadratic assignment problem library,’’ Global Optim., no. 10, pp. 391–403, 1997. M. Burstein and M. N. Youssef, ‘‘Timing influenced layout design,’’ in Proc. Design Autom. Conf., 1985, pp. 124–130. I. S. Bustany, D. G. Chinnery, J. R. Shinnerl, and V. Yutsis, ‘‘ISPD 2015 benchmarks with fence regions and routing blockages for detailed-routing-driven placement,’’ in Proc. Int. Symp. Phys. Design, 2015, pp. 157–164. A. E. Caldwell, A. B. Kahng, S. Mantik, I. L. Markov, and A. Zelikovsky, ‘‘On wirelength estimations for row-based placement,’’ IEEE Trans. Comput.-Aided Design, vol. 18, no. 9, pp. 1265–1278, Sep. 1999. A. E. Caldwell, A. B. Kahng, and I. L. Markov, ‘‘Can recursive bisection alone produce routable placements?’’ Proc. Design Autom. Conf., 2000, pp. 477–482. A. E. Caldwell, A. B. Kahng, and I. L. Markov, ‘‘Optimal partitioners and end-case placers for standard-cell layout,’’ IEEE Trans. Comput.-Aided Design, vol. 19, no. 11, pp. 1304–1313, Nov. 2000. S. Cauley, V. Balakrishnan, Y. C. Hu, and C.-K. Koh, ‘‘A parallel branch-and-cut approach for detailed placement,’’ ACM Trans. Design Autom. Electron. Syst., vol. 16, no. 2, pp. 18:1–18:19, 2011. T. F. Chan, J. Cong, and K. Sze, ‘‘Multilevel generalized force-directed method for circuit placement,’’ in Proc. Int. Symp. Phys. Design, 2005, pp. 185–192. T. F. Chan et al., ‘‘mPL6: Enhanced multilevel mixed-size placement with congestion control,’’ in Modern Circuit Placement, ser. Integrated Circuits and Systems, New York, NY, USA: Springer-Verlag, 2007, pp. 247–288. H. Chang et al., ‘‘Net criticality revisited: An effective method to improve timing in physical design,’’ in Proc. Int. Symp. Phys. Design, 2002, pp. 155–160. C. Chang, J. Cong, and X. Yuan, ‘‘Multi-level placement for large-scale mixed-size IC design,’’ in Proc. Asia South Pacific Design Autom. Conf., 2003, pp. 325–330. Y.-W. Chang, Z.-W. Jiang, and T.-C. Chen, ‘‘Essential issues in analytical placement algorithms,’’ IPSJ Trans. Syst. LSI Design Methodol., vol. 2, pp. 145–166, 2009. T. F. Chan, J. Cong, and E. Radke, ‘‘A rigorous framework for convergent net weighting schemes in timing-driven placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2009, pp. 288–294. Y.-T. Chang, C.-C. Hsu, M. P.-H. Lin, Y.-W. Tsi, and S.-F. Chen, ‘‘Post-placement power optimization with multi-bit flip-flops,’’ in Proc. Int. Conf. Comput.-Aided Design, 2010, pp. 218–223.
[33] S. Chatterjee and R. K. Brayton, ‘‘A new incremental placement algorithm and its application to congestion-aware divisor extraction,’’ in Proc. Int. Conf. Comput.-Aided Design, 2004, pp. 541–548. [34] H. Chen et al., ‘‘An algebraic multigrid solver for analytical placement with layout based clustering,’’ in Proc. Design Autom. Conf., 2003, pp. 794–799. [35] H.-C. Chen, Y.-L. Chuang, Y.-W. Chang, and Y.-C. Chang, ‘‘Constraint graph-based macro placement for modern mixed-size circuit designs,’’ in Proc. Int. Conf. Comput.-Aided Design, 2008, pp. 218–223. [36] T.-C. Chen, A. Chakraborty, and D. Z. Pan, ‘‘An integrated nonlinear placement framework with congestion and porosity aware buffer planning,’’ in Proc. Design Autom. Conf., 2008, pp. 702–707. [37] T.-C. Chen, M. Cho, D. Z. Pan, and Y.-W. Chang, ‘‘Metal-density-driven Placement for CMP variation and routability,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 12, pp. 2145–2155, Dec. 2008. [38] T.-C. Chen, P.-H. Yuh, Y.-W. Chang, F.-J. Huang, and T.-Y. Liu, ‘‘MP-trees: A packing-based macro placement algorithm for modern mixed-size designs,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 9, pp. 1621–1634, Sep. 2008. [39] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W. Chang, ‘‘NTUPlace3: An analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 7, pp. 1228–1240, Jul. 2008. [40] C. E. Cheng, ‘‘RISA: Accurate and efficient placement routability modeling,’’ in Proc. Int. Conf. Comput.-Aided Design, 1994, pp. 650–695. [41] C.-K. Cheng and E. S. Kuh, ‘‘Module placement based on resistive network optimization,’’ IEEE Trans. Comput.-Aided Design, vol. CAD-3, no. 3, pp. 218–225, Jul. 1984. [42] Y. Cheon, P.-H. Ho, A. B. Kahng, S. Reda, and Q. Wang, ‘‘Power-aware placement,’’ in Proc. Design Autom. Conf., 2005, pp. 795–800. [43] M.-F. Chiang, T. Okamoto, and T. Yoshimura, ‘‘Register placement for high-performance circuits,’’ in Proc. Design Autom. Test Eur. Conf. Exhibit., 2009, pp. 1470–1475. [44] M. Cho, H. Ren, H. Xiang, and R. Puri, ‘‘History-based VLSI legalization using network flow,’’ in Proc. Design Autom. Conf., 2010, pp. 286–291. [45] W. Choi and K. Bazargan, ‘‘Hierarchical global floor placement using simulated annealing and network flow area migration,’’ in Proc. Design Autom. Test Eur. Conf. Exhibit., 2003, pp. 1104–1105. [46] S. Chou, M.-K. Hsu, and Y.-W. Chang, ‘‘Structure-aware placement for datapath-intensive circuit designs,’’ in Proc. Design Autom. Conf., 2012, pp. 762–767. [47] W.-K. Chow, J. Kuang, X. He, W. Cai, and E. F. Y. Young, ‘‘Cell density-driven detailed placement with displacement constraint,’’ in Proc. Int. Symp. Phys. Design, 2014, pp. 3–10. [48] A. Chowdhary et al., ‘‘How accurately can we model timing in a placement engine,’’ in Proc. Design Autom. Conf., 2005, pp. 801–806. [49] C. Chu and M. Pan, ‘‘IPR: An integrated placement and routing algorithm,’’ in Proc. Design Autom. Conf., 2007, pp. 59–62.
[50] Y.-L. Chuang, H.-T. Lin, T.-Y. Ho, Y.-W. Chang, and D. Marculescu, ‘‘PRICE: Power reduction by placement and clock-network co-synthesis for pulsed-latch designs,’’ in Proc. Int. Conf. Comput.-Aided Design, 2011, pp. 85–90. [51] Y.-L. Chuang et al., ‘‘Design-hierarchy aware mixed-size placement for routability optimization,’’ in Proc. Int. Conf. Comput.-Aided Design, 2010, pp. 663–668. [52] J. Cong, B. Liu, G. Luo, and R. Prabhakar, ‘‘Towards layout-friendly high-level synthesis,’’ in Proc. Int. Symp. Phys. Design, 2012, pp. 165–172. [53] J. Cong, G. Luo, and E. Radke, ‘‘Highly efficient gradient computation for density-constrained analytical placement,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 12, pp. 2133–2144, Dec. 2008. [54] J. Cong, J. R. Shinnerl, M. Xie, T. Kong, and X. Yuan, ‘‘Large-scale circuit placement,’’ ACM Trans. Design Autom. Electron. Syst., vol. 10, no. 2, pp. 389–430, 2005. [55] J. Cong and M. Xie, ‘‘A robust detailed placement for mixed-size IC designs,’’ in Proc. Asia South Pacific Design Autom. Conf., 2006, pp. 188–194. [56] J. Cong, M. Romesis, and J. R. Shinnerl, ‘‘Robust mixed-size placement under tight white-space constraints,’’ in Proc. Int. Conf. Comput.-Aided Design, 2005, pp. 165–172. [57] K.-R. Dai, C.-H. Lu, and Y.-L. Li, ‘‘GRPlacer: Improving routability and wirelength of global routing with circuit replacement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2009, pp. 351–356. [58] K. Doll, F. M. Johannes, and K. J. Antreich, ‘‘Iterative placement improvement by network flow methods,’’ IEEE Trans. Comput.-Aided Design, vol. 13, no. 10, pp. 1189–1200, Oct. 1994. [59] K. Doll, F. M. Johannes, and G. Sigl, ‘‘DOMINO: Deterministic placement improvement with hill-climbing capabilities,’’ in Proc. Very Large Scale Integr. (VLSI) Conf., 1991, pp. 91–100. [60] W. E. Donath et al., ‘‘Transformational placement and synthesis,’’ in Proc. Design Autom. Test Eur. Conf. Exhibit., 2000, pp. 194–201. [61] A. E. Dunlop et al., ‘‘Chip layout optimization using critical path weighting,’’ in Proc. Design Autom. Conf., 1984, pp. 133–136. [62] H. Eisenmann and F. M. Johannes, ‘‘Generic global placement and floorplanning,’’ in Proc. Design Autom. Conf., 1998, pp. 269–274. [63] C. M. Fiduccia and R. M. Mattheyses, ‘‘A linear-time heuristic for improving network partitions,’’ in Proc. Design Autom. Conf., 1982, pp. 175–181. [64] T. Gao, P. M. Vaidya, and C. L. Liu, ‘‘A performance driven macro-cell placement algorithm,’’ in Proc. Design Autom. Conf., 1992, pp. 147–152. [65] S. Ghiasi, E. Bozorgzadeh, P.-K. Huang, R. Jafari, and M. Sarrafzadeh, ‘‘A unified theory of timing budget management,’’ IEEE Trans. Comput.-Aided Design, vol. 25, no. 11, pp. 2364–2375, Nov. 2006. [66] B. Halpin, C. Y. R. Chen, and N. Sehgal, ‘‘A sensitivity based placer for standard cells,’’ in Proc. Great Lakes Symp. VLSI, 2000, pp. 193–196. [67] B. Halpin, C. Y. R. Chen, and N. Sehgal, ‘‘Timing driven placement using physical net constraints,’’ in Proc. Design Autom. Conf., 2001, pp. 780–783.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
1999
Markov et al.: Progress and Challenges in VLSI Placement Research
[68] B. Halpin, C. Y. R. Chen, and N. Sehgal, ‘‘Detailed placement with net length constraints,’’ in Proc. Int. Workshop System-on-Chip Real-Time Appl., 2003, pp. 22–27. [69] T. Hamada, C. K. Cheng, and P. M. Chau, ‘‘Prime: A timing-driven placement tool using a piecewise linear resistive network approach,’’ in Proc. Design Autom. Conf., 1993, pp. 531–536. [70] P. S. Hauge, R. Nair, and E. J. Yoffa, ‘‘Circuit placement for predictable performance,’’ in Proc. Int. Conf. Comput.-Aided Design, 1987, pp. 88–91. [71] X. He et al., ‘‘Ripple: An effective routability-driven placer by iterative cell movement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2011, pp. 74–79. [72] X. He et al., ‘‘Ripple 2.0: High quality routability-driven placement via global router integration,’’ in Proc. Design Autom. Conf., 2013, pp. 1–6. [73] D. Hill, ‘‘Method and system for high speed detailed placement of cells within an integrated circuit design,’’ U.S. Patent 6370673, 2001. [74] T.-Y. Ho and S.-H. Liu, ‘‘Fast legalization for standard cell placement with simultaneous wirelength and displacement minimization,’’ in Proc. VLSI System on Chip Conf., 2010, pp. 369–374. [75] W. Hou et al., ‘‘A new congestion-driven placement algorithm based on cell inflation,’’ in Proc. Asia South Pacific Design Autom. Conf., 2001, pp. 723–728. [76] M.-K. Hsu and Y.-W. Chang, ‘‘Unified analytical global placement for large-scale mixed-size circuit designs,’’ in Proc. Int. Conf. Comput.-Aided Design, 2010, pp. 657–662. [77] M.-K. Hsu, Y.-W. Chang, and V. Balabanov, ‘‘TSV-aware analytical placement for 3D IC designs,’’ in Proc. Design Autom. Conf., 2011, pp. 664–669. [78] M.-K. Hsu, S. Chou, T.-H. Lin, and Y.-W. Chang, ‘‘Routability-driven analytical placement for mixed-size circuit designs,’’ in Proc. Int. Conf. Comput.-Aided Design, 2011, pp. 80–84. [79] B. Hu and M. Marek-Sadowska, ‘‘Congestion minimization during placement without estimation,’’ in Proc. Int. Conf. Comput.-Aided Design, 2002, pp. 739–745. [80] B. Hu and M. Marek-Sadowska, ‘‘Multilevel fixed-point-addition-based VLSI placement,’’ IEEE Trans. Comput.-Aided Design, vol. 24, no. 8, pp. 1188–1203, Aug. 2005. [81] J. Hu, A. B. Kahng, S.-H. Kang, M.-C. Kim, and I. L. Markov, ‘‘Sensitivity-guided metaheuristics for accurate discrete gate sizing,’’ in Proc. Int. Conf. Comput.-Aided Design, 2012, pp. 233–239. [82] J. Hu, Y. Shin, N. Dhanwada, and R. Marculescu, ‘‘Architecting voltage islands in core-based system-on-a-chip designs,’’ in Proc. Int. Symp. Low Power Electron. Design, 2004, pp. 180–185. [83] J. Hu, J. A. Roy, and I. L. Markov, ‘‘Completing high-quality routes,’’ in Proc. Int. Symp. Phys. Design, 2010, pp. 35–41. [84] J. Hu, M.-C. Kim, and I. L. Markov, ‘‘Taming the complexity of coordinated place and route,’’ in Proc. Design Autom. Conf., 2013, pp. 1–7. [85] T. C. Hu and K. Moerder, ‘‘Multiterminal flows in a hypergraph,’’ in VLSI Circuit Layout: Theory and Design, T. C. Hu and E. S. Kuh, Eds. New York, NY, USA: IEEE Press, 1985.
2000
[86] S. Hur et al., ‘‘Force directed mongrel with physical net constraints,’’ in Proc. Design Autom. Conf., 2003, pp. 214–219. [87] S.-W. Hur and J. Lillis, ‘‘Mongrel: Hybrid techniques for standard cell placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2000, pp. 165–170. [88] International Technology Roadmap for Semiconductors (ITRS). [Online]. Available: http://public.itrs.net [89] M. A. B. Jackson and E. S. Kuh, ‘‘Performance-driven placement of cell based ICs,’’ in Proc. Design Autom. Conf., 1989, pp. 370–375. [90] D. Jariwala and J. Lillis, ‘‘RBI: Simultaneous placement and routing optimization technique,’’ IEEE Trans. Comput.-Aided Design, vol. 26, no. 1, pp. 127–141, Jan. 2007. [91] Z.-W. Jiang, B.-Y. Su, and Y.-W. Chang, ‘‘Routability-driven analytical placement by net overlapping removal for large-scale mixed-size designs,’’ in Proc. Design Autom. Conf., 2008, pp. 167–172. [92] H.-R. Jiang, C.-L. Chang, and Y.-M. Yang, ‘‘INTEGRA: Fast multibit flip-flop clustering for clock power saving,’’ IEEE Trans. Comput.-Aided Design, vol. 31, no. 2, pp. 192–204, Feb. 2012. [93] A. B. Kahng, ‘‘Classical floorplanning harmful?’’ in Proc. Int. Symp. Phys. Design, 2000, pp. 207–213. [94] A. B. Kahng, H. Lee, and J. Li, ‘‘Horizontal benchmark extension for improved assessment of physical CAD research,’’ in Proc. Great Lakes Symp. VLSI, 2014, pp. 27–32. [95] A. B. Kahng, J. Lienig, I. L. Markov, and J. Hu, VLSI Physical Design: From Graph Partitioning to Timing Closure. New York, NY, USA: Springer-Verlag, 2011. [96] A. B. Kahng, I. L. Markov, and S. Reda, ‘‘On legalization of row-based placements,’’ in Proc. Great Lakes Symp. VLSI, 2004, pp. 214–219. [97] A. B. Kahng, C.-H. Park, P. Sharma, and Q. Wang, ‘‘Lens aberration aware placement for timing yield,’’ ACM Trans. Design Autom. Electron. Syst., vol. 14, no. 1, pp. 16:1–16:26, 2009. [98] A. B. Kahng, P. Tucker, and A. Zelikovsky, ‘‘Optimization of linear placements for wirelength minimization with free sites,’’ in Proc. Asia South Pacific Design Autom. Conf., 1999, pp. 241–244. [99] A. B. Kahng and Q. Wang, ‘‘A faster implementation of APlace,’’ in Proc. Int. Symp. Phys. Design, 2006, pp. 218–220. [100] G. Karypis and V. Kumar, ‘‘Multilevel k-way hypergraph partitioning,’’ in Proc. Design Autom. Conf., 1999, pp. 343–348. [101] A. A. Kennings, N. K. Darv, and L. Behjat, ‘‘Detailed placement accounting for technology constraints,’’ in Proc. VLSI System on Chip Conf., 2014, DOI: 10.1109/VLSI-SoC. 2014.7004188. [102] J. Knechtel, I. L. Markov, and J. Lienig, ‘‘Assembling 2-D blocks into 3-D chips,’’ IEEE Trans. Comput.-Aided Design, vol. 31, no. 2, pp. 228–241, Feb. 2012. [103] M.-C. Kim, D.-J. Lee, and I. L. Markov, ‘‘SimPL: An effective placement algorithm,’’ in Proc. Int. Conf. Comput.-Aided Design, 2010, pp. 649–656. [104] M.-C. Kim, J. Hu, D.-J. Lee, and I. L. Markov, ‘‘A SimPLR method for routability-driven placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2011, pp. 67–73.
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
[105] M.-C. Kim, D.-J. Lee, and I. L. Markov, ‘‘SimPL: An effective placement algorithm,’’ IEEE Trans. Comput.-Aided Design, vol. 31, no. 1, pp. 50–60, Jan. 2012. [106] M.-C. Kim and I. L. Markov, ‘‘ComPLx: A competitive primal-dual Lagrange optimization for global placement,’’ in Proc. Design Autom. Conf., 2012, pp. 747–752. [107] M.-C. Kim, N. Viswanathan, C. J. Alpert, I. L. Markov, and S. Ramji, ‘‘MAPLE: Multilevel adaptive placement for mixed-size designs,’’ in Proc. Int. Symp. Phys. Design, 2012, pp. 193–200. [108] M.-C. Kim, N. Viswanathan, Z. Li, and C. J. Alpert, ‘‘ICCAD-2013 CAD contest in placement finishing and benchmark suite,’’ in Proc. Int. Conf. Comput.-Aided Design, 2013, pp. 268–270. [109] M.-C. Kim, J. Hu, and N. Viswanathan, ‘‘ICCAD-2014 CAD contest in incremental timing-driven placement and benchmark suite,’’ in Proc. Int. Conf. Comput.-Aided Design, 2014, pp. 361–366. [110] J. M. Kleinhans, G. Sigl, F. M. Johannes, and K. Antreich, ‘‘GORDIAN: VLSI placement by quadratic programming and slicing optimization,’’ IEEE Trans. Comput.-Aided Design, vol. 10, no. 3, pp. 356–365, Mar. 1991. [111] T. Kong, ‘‘A novel net weighting algorithm for timing-driven placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2002, pp. 172–176. [112] J. Kuang, W.-K. Chow, and E. F. Y. Young, ‘‘Tripple patterning lithography aware optimization for standard cell based design,’’ in Proc. Int. Conf. Comput.-Aided Design, 2014, pp. 108–115. [113] D. E. Lackey et al., ‘‘Managing power and performance for system-on-chip designs using voltage islands,’’ in Proc. Int. Conf. Comput.-Aided Design, 2002, pp. 195–202. [114] L. Lavagno, G. Martin, and L. Scheffer, EDA for IC Implementation, Circuit Design, Process Technology. Boca Raton, FL, USA: CRC Press, 2006. [115] D.-J. Lee and I. L. Markov, ‘‘Obstacle-aware clock-tree shaping during placement,’’ IEEE Trans. Comput.-Aided Design, vol. 31, no. 2, pp. 205–216, Feb. 2012. [116] Y.-M. Lee, T.-Y. Wu, and P.-Y. Chang, ‘‘A hierarchical bin-based legalizer for standard-cell designs with minimal disturbance,’’ in Proc. Asia South Pacific Design Autom. Conf., 2010, pp. 568–573. [117] C. Li and C.-K. Koh, ‘‘Recursive function smoothing of half-perimeter wirelength for analytical placement,’’ in Proc. Int. Symp. Quality Electron. Design, 2007, pp. 829–834. [118] C. Li, C.-K. Koh, and P. H. Madden, ‘‘Floorplan management: Incremental placement for gate sizing and buffer insertion,’’ in Proc. Asia South Pacific Design Autom. Conf., 2005, pp. 349–354. [119] C. Li, M. Xie, C.-K. Koh, J. Cong, and P. H. Madden, ‘‘Routability-driven placement and white space allocation,’’ IEEE Trans. Comput.-Aided Design, vol. 26, no. 5, pp. 858–871, May 2007. [120] S. Li and C.-K. Koh, ‘‘Mixed integer programming models for detailed placement,’’ in Proc. Int. Symp. Phys. Design, 2012, pp. 87–94. [121] Z. Li, C. N. Sze, C. J. Alpert, J. Hu, and W. Shi, ‘‘Making fast buffer insertion even faster via approximation techniques,’’ in Proc. Asia South Pacific Design Autom. Conf., 2005, pp. 13–18.
Markov et al.: Progress and Challenges in VLSI Placement Research
[122] Z. Li, W. Wu, and X. Hong, ‘‘Congestion driven incremental placement algorithm for standard cell layout,’’ in Proc. Asia South Pacific Design Autom. Conf., 2003, pp. 723–728. [123] T. Lin and C. Chu, ‘‘POLAR 2.0: An effective routability-driven placer,’’ in Proc. Design Autom. Conf., 2014, pp. 1–6. [124] B. Liu, Y. Cai, Q. Zhou, and X. Hong, ‘‘Power driven placement with layout aware supply voltage assignment for voltage island generation in dual-Vdd designs,’’ in Proc. Asia South Pacific Design Autom. Conf., 2006, pp. 582–587. [125] J. Lu et al., ‘‘ePlace: Electrostatics based placement using Nesterov’s method,’’ in Proc. Design Autom. Conf., 2014, DOI: 10.1145/2593069.2593133. [126] W. K. Luk, ‘‘A fast physical constraint generator for timing driven placement,’’ in Proc. Design Autom. Conf., 1991, pp. 626–631. [127] T. Lin and C. Chu, ‘‘POLAR 2.0: An effective routability-driven placer,’’ in Proc. Design Autom. Conf., 2013, pp. 1–6. [128] W.-H. Liu, W.-C. Kao, Y.-L. Li, and K.-Y. Chao, ‘‘Multi-threaded collision-aware global routing with bounded-length maze routing,’’ in Proc. Design Autom. Conf., 2010, pp. 200–205. [129] W.-H. Liu, C.-K. Koh, and Y.-L. Li, ‘‘Optimization of placement solutions for routability,’’ in Proc. Design Autom. Conf., 2013, pp. 1–9. [130] T. Luo, D. Newmark, and D. Z. Pan, ‘‘A new LP based incremental timing driven placement for high performance designs,’’ in Proc. Design Autom. Conf., 2006, pp. 1115–1120. [131] T. Luo and D. Z. Pan, ‘‘DPlace2.0: A stable and efficient analytical placement based on diffusion,’’ in Proc. Asia South Pacific Design Autom. Conf., 2008, pp. 346–351. [132] T. Luo, H. Ren, C. J. Alpert, and D. Z. Pan, ‘‘Computational geometry based placement migration,’’ in Proc. Design Autom. Conf., 2007, pp. 41–47. [133] Y. Lu et al., ‘‘Navigating registers in placement for clock network minimization,’’ in Proc. Design Autom. Conf., 2005, pp. 176–181. [134] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, ‘‘Interconnect-power dissipation in a microprocessor,’’ in Proc. Int. Workshop Syst. Level Interconnect Prediction, 2004, pp. 7–13. [135] F. Mo and R. K. Brayton, ‘‘Fishbone: A block-level placement and routing scheme,’’ in Proc. Int. Symp. Phys. Design, 2003, pp. 204–209. [136] F. Mo and R. K. Brayton, ‘‘Placement based multiplier rewiring for cell-based designs,’’ in Proc. Int. Conf. Comput.-Aided Design, 2008, pp. 430–433. [137] F. Mo, A. Tabbara, and R. K. Brayton, ‘‘A force-directed macro-cell placer,’’ in Proc. Int. Conf. Comput.-Aided Design, 2000, pp. 177–180. [138] F. Mo, A. Tabbara, and R. K. Brayton, ‘‘A timing-driven macro-cell placement algorithm,’’ in Proc. Int. Conf. Comput. Design, 2001, pp. 322–332. [139] M. D. Moffitt, J. A. Roy, I. L. Markov, and M. E. Pollack, ‘‘Constraint-driven floorplan repair,’’ ACM Trans. Design Autom. Electron. Syst., vol. 13, no. 4, pp. 1–13, 2008. [140] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, ‘‘Generation of performance constraints for layout,’’ IEEE Trans. Comput.-Aided Design, vol. 8, no. 8, pp. 860–874, Aug. 1989.
[141] G.-J. Nam and J. Cong, Modern Circuit Placement: Best Practices and Results. New York, NY, USA: Springer-Verlag, 2007. [142] W. C. Naylor, R. Donelly, and L. Sha, ‘‘Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer,’’ U.S. Patent 6 301 693, Oct. 2001. [143] B. Obermeier and F. M. Johannes, ‘‘Temperature-aware global placement,’’ in Proc. Asia South Pacific Design Autom. Conf., 2004, pp. 143–148. [144] B. Obermeier, H. Ranke, and F. M. Johannes, ‘‘Kraftwerk: A versatile placement approach,’’ in Proc. Int. Symp. Phys. Design, 2005, pp. 242–244. [145] R. H. J. M. Otten and R. K. Brayton, ‘‘Performance planning,’’ Integration, vol. 29, no. 1, pp. 1–24, 2000. [146] M. Orshansky, S. Nassif, and D. Boning, Design for Manufacturability and Statistical Design: A Constructive Approach (Integrated Circuits and Systems). New York, NY, USA: Springer-Verlag, 2010. [147] M. M. Ozdal et al., ‘‘The ISPD-2012 discrete cell sizing contest and benchmark suite,’’ in Proc. Int. Symp. Phys. Design, 2012, pp. 161–164. [148] M. Pan and C. Chu, ‘‘FastRoute: A step to integrate global routing into placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2006, pp. 59–62. [149] M. Pan, N. Viswanathan, and C. Chu, ‘‘An efficient and effective detailed placement algorithm,’’ in Proc. Int. Conf. Comput.-Aided Design, 2005, pp. 48–55. [150] D. A. Papa, S. Krishnaswamy, and I. L. Markov, ‘‘SPIRE: A retiming-based physical-synthesis transformation system,’’ in Proc. Int. Conf. Comput.-Aided Design, 2010, pp. 373–380. [151] D. A. Papa, M. D. Moffitt, C. J. Alpert, and I. L. Markov, ‘‘Speeding up physical synthesis with transactional timing analysis,’’ IEEE Design Test, vol. 27, no. 5, pp. 14–25, Sep./Oct. 2010. [152] D. A. Papa et al., ‘‘RUMBLE: An incremental, timing-driven, physical- synthesis optimization algorithm,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 12, pp. 2156–2168, Dec. 2008. [153] D. A. Papa et al., ‘‘Physical synthesis with clock-network optimization for large systems on chips,’’ IEEE Micro, vol. 31, no. 4, pp. 51–62, Jul./Aug. 2011. [154] P. N. Parakh, R. B. Brown, and K. A. Sakallah, ‘‘Congestion driven quadratic placement,’’ in Proc. Design Autom. Conf., 1998, pp. 275–278. [155] S. Plaza, I. L. Markov, and V. Bertacco, ‘‘Optimizing non-monotonic interconnect using functional simulation and logic restructuring,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 12, pp. 2107–2119, Dec. 2008. [156] S. Popovych et al., ‘‘Density-aware detailed placement with instant legalization,’’ in Proc. Design Autom. Conf., 2014, DOI: 10.1145/2593069.2593120. [157] R. Puri, L. Stok, and S. Bhattacharya, ‘‘Keeping hot chips cool,’’ in Proc. Design Autom. Conf., 2005, pp. 285–288. [158] R. Puri et al., ‘‘Pushing ASIC performance in a power envelope,’’ in Proc. Design Autom. Conf., 2003, pp. 788–793. [159] K. Rajagopal et al., ‘‘Timing driven force directed placement with physical net constraints,’’ in Proc. Int. Symp. Phys. Design, 2003, pp. 60–66.
[160] B. M. Reiss and G. G. Ettelt, ‘‘SPEED: Fast and efficient timing driven placement,’’ in Proc. Int. Symp. Circuits Syst., 1995, pp. 377–380. [161] H. Ren, D. Z. Pan, C. J. Alpert, and P. G. Villarrubia, ‘‘Diffusion-based placement migration,’’ in Proc. Design Autom. Conf., 2005, pp. 515–520. [162] H. Ren, D. Z. Pan, C. J. Alpert, G.-J. Nam, and P. G. Villarrubia, ‘‘Hippocrates: First-do-no-harm detailed placement,’’ in Proc. Asia South Pacific Design Autom. Conf., 2007, pp. 141–146. [163] H. Ren, D. Z. Pan, and D. S. Kung, ‘‘Sensitivity guided net weighting for placement driven synthesis,’’ IEEE Trans. Comput.-Aided Design, vol. 24, no. 5, pp. 711–721, May 2005. [164] J. A. Roy, S. N. Adya, D. A. Papa, and I. L. Markov, ‘‘Min-cut floorplacement,’’ IEEE Trans. Comput.-Aided Design, vol. 25, no. 7, pp. 1313–1326, Jul. 2006. [165] J. A. Roy and I. L. Markov, ‘‘Seeing the forest and the trees: Steiner wirelength optimization in placement,’’ IEEE Trans. Comput.-Aided Design, vol. 23, no. 4, pp. 632–644, Apr. 2007. [166] J. A. Roy and I. L. Markov, ‘‘ECO-system: Embracing the change in placement,’’ IEEE Trans. Comput.-Aided Design, vol. 26, no. 12, pp. 2173–2185, Dec. 2007. [167] J. A. Roy, A. N. Ng, R. Aggarwal, V. Ramachandran, and I. L. Markov, ‘‘Solving modern mixed-size placement instances,’’ Integration, vol. 42, no. 2, pp. 262–275, 2009. [168] J. A. Roy et al., ‘‘Capo: Robust and scalable open-source min-cut floorplacer,’’ in Proc. Int. Symp. Phys. Design, 2005, pp. 224–226. [169] J. A. Roy, N. Viswanathan, G.-J. Nam, C. J. Alpert, and I. L. Markov, ‘‘CRISP: Congestion reduction by iterated spreading during placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2009, pp. 357–362. [170] A. E. Ruehli, P. K. Wolff, and G. Goertzel, ‘‘Analytical power/timing optimization technique for digital systems,’’ in Proc. Design Autom. Conf., 1977, pp. 142–146. [171] S. M. Sait, M. R. Minhas, and J. A. Khan, ‘‘Performance and low power driven VLSI standard cell placement using tabu search,’’ in Proc. Congr. Evol. Comput., 2002, pp. 372–377. [172] M. Sarrafzadeh and M. Wang, ‘‘NRG: Global and detailed placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 1997, pp. 532–537. [173] M. Sarrafzadeh, M. Wang, and X. Yang, Modern Placement Techniques. New York, NY, USA: Springer-Verlag, 2002. [174] P. Saxena, ‘‘On controlling perturbation due to repeaters during quadratic placement,’’ IEEE Trans. Comput.-Aided Design, vol. 25, no. 9, pp. 1733–1743, Sep. 2006. [175] N. Selvakkumaran, P. N. Parakh, and G. Karypis, ‘‘Perimeter-degree: A priori metric for directly measuring and homogenizing interconnection complexity in multilevel placement,’’ in Proc. IEEE Conf. Syst. Level Interconnect Prediction, 2003, pp. 53–59. [176] W. Shen, Y. Cai, X. Hong, and J. Hu, ‘‘Activity and register placement aware gated clock network design,’’ in Proc. Int. Symp. Phys. Design, 2008, pp. 182–189. [177] R. E. Showalter, ‘‘Monotone operators in Banach space and nonlinear partial differential equations,’’ Math. Surv. Monographs, no. 49, pp. 162–163, 1997.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
2001
Markov et al.: Progress and Challenges in VLSI Placement Research
[178] G. Sigl, K. Doll, and F. M. Johannes, ‘‘Analytical placement: A linear or a quadratic objective function?’’ in Proc. Design Autom. Conf., 1991, pp. 427–432. [179] P. Spindler and F. M. Johannes, ‘‘Fast and accurate routing demand estimation for efficient routability-driven placement,’’ in Proc. Design Autom. Test Eur. Conf. Exhibit., 2007, pp. 1226–1231. [180] P. Spindler, U. Schlichtmann, and F. M. Johannes, ‘‘Abacus: Fast legalization of standard cell circuits with minimal movement,’’ in Proc. Int. Symp. Phys. Design, 2008, pp. 47–53. [181] P. Spindler, U. Schlichtmann, and F. M. Johannes, ‘‘Kraftwerk2VA fast force-directed quadratic placement approach using an accurate net model,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 8, pp. 1398–1411, Aug. 2008. [182] A. Srinivasan, K. Chaudhary, and E. S. Kuh, ‘‘Ritual: A performance driven placement algorithm for small cell ICs,’’ in Proc. Int. Conf. Comput.-Aided Design, 1991, pp. 48–51. [183] W. Swartz and C. Sechen, ‘‘Timing driven placement for large standard cell circuits,’’ in Proc. Design Autom. Conf., 1995, pp. 211–215. [184] H. Tennakoon and C. Sechen, ‘‘Nonconvex gate delay modeling and delay optimization,’’ IEEE Trans. Comput.-Aided Design, vol. 27, no. 9, pp. 1583–1594, Sep. 2003. [185] M. Terai, K. Takahashi, and K. Sato, ‘‘A new min-cut placement algorithm for timing assurance layout design meeting net length constraint,’’ in Proc. Design Autom. Conf., 1990, pp. 96–102. [186] L. Trevillyan, D. Kung, R. Puri, L. N. Reddy, and M. A. Kazda, ‘‘An integrated environment for technology closure of deep-submicron IC designs,’’ IEEE Design Test, vol. 21, no. 1, pp. 14–22, Jan./Feb. 2004. [187] R.-S. Tsay, E. S. Kuh, and C.-P. Hsu, ‘‘Proud: A fast sea-of-gates placement algorithm,’’ in Proc. Design Autom. Conf., 1988, pp. 318–323. [188] R.-S. Tsay and J. Koehl, ‘‘An analytic net weighting approach for performance optimization in circuit placement,’’ in Proc. Design Autom. Conf., 1991, pp. 636–639. [189] K. Tsota, C. Koh, and V. Balakrishnan, ‘‘Guiding global placement with wire density,’’ in Proc. Int. Conf. Comput.-Aided Design, 2008, pp. 212–217. [190] H. Vaishnav and M. Pedram, ‘‘PCUBE: A performance driven placement algorithm for low power designs,’’ in Proc. Design Autom. Conf./EURO-VHDL, 1993, pp. 72–77. [191] L. P. P. P. van Ginneken, ‘‘Buffer placement in distributed RC-tree network for minimal Elmore delay,’’ in Proc. Int. Symp. Circuits Syst., 1990, pp. 865–868. [192] N. Viswanathan, C. J. Alpert, Z. Li, G.-J. Nam, and J. A. Roy, ‘‘The ISPD-2011 routability-driven placement contest and benchmark suite,’’ in Proc. Int. Symp. Phys. Design, 2011, pp. 141–146. [193] N. Viswanathan, C. J. Alpert, C. N. Sze, Z. Li, and Y. Wei, ‘‘The DAC 2012
2002
[194]
[195]
[196]
[197]
[198]
[199]
[200]
[201]
[202]
[203]
[204]
[205]
[206]
[207]
routability-driven placement contest and benchmark suite,’’ in Proc. Design Autom. Conf., 2012, pp. 774–782. N. Viswanathan, C. J. Alpert, C. N. Sze, Z. Li, and Y. Wei, ‘‘ICCAD-2012 CAD contest in design hierarchy aware routability-driven placement and benchmark suite,’’ in Proc. Int. Conf. Comput.-Aided Design, 2012, pp. 345–348. N. Viswanathan, G.-J. Nam, C. J. Alpert, P. G. Villarrubia, and H. Ren, ‘‘RQL: Global placement via relaxed quadratic spreading and linearization,’’ in Proc. Design Autom. Conf., 2007, pp. 453–458. N. Viswanathan et al., ‘‘ITOP: Integrating timing optimization within placement,’’ in Proc. Int. Symp. Phys. Design, 2010, pp. 83–90. N. Viswanathan, M. Pan, and C. Chu, ‘‘FastPlace 3.0: A fast multilevel quadratic placement algorithm with placement congestion control,’’ in Proc. Asia South Pacific Design Autom. Conf., 2007, pp. 135–140. M. Vujkovic, D. Wadkins, B. Swartz, and C. Sechen, ‘‘Efficient timing closure without timing driven placement and routing,’’ in Proc. Design Autom. Conf., 2004, pp. 268–273. C.-K. Wang et al., ‘‘Closing the gap between global and detailed placement: Techniques for improving routability,’’ in Proc. Int. Symp. Phys. Design, 2015, pp. 149–156. M. Wang and M. Sarrafzadeh, ‘‘On the behaviour of congestion minimization during placement,’’ in Proc. Int. Symp. Phys. Design, 1999, pp. 145–150. M. Wang and M. Sarrafzadeh, ‘‘Model and minimization of routing congestion,’’ in Proc. Asia South Pacific Design Autom. Conf., 2000, pp. 185–190. M. Wang, X. Yang, and M. Sarrafzadeh, ‘‘Congestion minimization during placement,’’ IEEE Trans. Comput.-Aided Design, vol. 19, no. 10, pp. 1140–1148, Oct. 2000. M. Wang, X. Yang, K. Eguro, and M. Sarrafzadeh, ‘‘Multicenter congestion estimation and minimization during placement,’’ in Proc. Int. Symp. Phys. Design, 2000, pp. 147–152. Y. Wang, Q. Zhou, X. Hong, and Y. Cai, ‘‘Clock-tree aware placement based on dynamic clock-tree building,’’ in Proc. Int. Symp. Circuits Syst., 2007, pp. 2040–2043. S. Ward, D. Ding, and D. Z. Pan, ‘‘PADE: A high-performance placer with automatic datapath extraction and evaluation through high dimensional data learning,’’ in Proc. Design Autom. Conf., 2012, pp. 756–761. L.-T. Wang, Y.-W. Chang, and K.-T. Cheng, Electronic Design Automation: Synthesis, Verification, Test. San Francisco, CA, USA: Morgan Kaufmann, 2009. E. Wein and J. Benkoski, ‘‘Hard macros will revolutionize SoC design,’’ EE Times, Aug. 20, 2004. [Online]. Available: http://www.eetimes.com/news/design/ showArtical.jhtml?articalID=26807055.
Proceedings of the IEEE | Vol. 103, No. 11, November 2015
[208] J. Werber, D. Rautenbach, and C. Szegedy, ‘‘Timing optimization by restructuring long combinatorial paths,’’ in Proc. Int. Conf. Comput.-Aided Design, 2007, pp. 536–543. [209] J. Westra, C. Bartels, and P. Groeneveld, ‘‘Probabilistic congestion prediction,’’ in Proc. Int. Symp. Phys. Design, 2004, pp. 204–209. [210] J. Westra and P. Groeneveld, ‘‘Is probabilistic congestion estimation worthwhile?’’ in Proc. Int. Workshop Syst. Level Interconnect Prediction, 2005, pp. 99–106. [211] B. P. Wong et al., Nano-CMOS Design for Manufacturability. New York, NY, USA: Wiley, 2009. [212] H. Xiang et al., ‘‘Logical and physical restructuring of fan-in trees,’’ in Proc. Int. Symp. Phys. Design, 2010, pp. 67–74. [213] Z. Xiu and R. Rutenbar, ‘‘Timing-driven placement by grid-warping,’’ in Proc. Design Autom. Conf., 2005, pp. 585–590. [214] Y. Xu, Y. Zhang, and C. Chu, ‘‘FastRoute 4.0: Global router with efficient via minimization,’’ in Proc. Asia South Pacific Design Autom. Conf., 2009, pp. 576–581. [215] J. Z. Yan, C. Chu, and W.-K. Mak, ‘‘SafeChoice: A novel approach to hypergraph clustering for wirelength-driven placement,’’ IEEE Trans. Comput.-Aided Design, vol. 30, no. 7, pp. 1020–1033, Jul. 2011. [216] J. Z. Yan, N. Viswanathan, and C. Chu, ‘‘Handling complexities in modern large-scale mixed-size placement,’’ in Proc. Design Autom. Conf., 2009, pp. 436–441. [217] X. Yang, B.-K. Choi, and M. Sarrafzadeh, ‘‘Routability-driven white space allocation for fixed-die standard-cell placement,’’ IEEE Trans. Comput.-Aided Design, vol. 22, no. 4, pp. 410–419, Apr. 2003. [218] C. Yeh, Y. Kang, S. Shieh, and J. Wang, ‘‘Layout techniques supporting the use of dual supply voltages for cell-based designs,’’ in Proc. Design Autom. Conf., 1999, pp. 62–67. [219] H. Youssef and E. Shragowitz, ‘‘Timing constraints for correct peformance,’’ in Proc. Int. Conf. Comput.-Aided Design, 1990, pp. 24–27. [220] B. Yu et al., ‘‘Methodology for standcard cell compliance and detailed placement for triple patterning lithography,’’ IEEE Trans. Comput.-Aided Design, vol. 34, no. 5, pp. 726–739, May 2015. [221] V. Yutsis, I. Bustany, D. G. Chinnery, J. R. Shinnerl, and W.-H. Liu, ‘‘ISPD 2014 benchmarks with sub-45 nm technology rules for detailed-routing-driven placement,’’ in Proc. Int. Symp. Phys. Design, pp. 161–168. [222] Y. Zhang and C. Chu, ‘‘CROP: Fast and effective congestion refinement of placement,’’ in Proc. Int. Conf. Comput.-Aided Design, 2009, pp. 344–350. [223] K. Zhong and S. Dutt, ‘‘Algorithms for simultaneous satisfaction of multiple constraints and objective optimization in a placement flow with application to congestion control,’’ in Proc. Design Autom. Conf., 2002, pp. 854–859.
Markov et al.: Progress and Challenges in VLSI Placement Research
ABOUT THE AUTHORS Igor L. Markov (Fellow, IEEE) received the Ph.D. degree in computer science from the University of California Los Angeles (UCLA), Los Angeles, CA, USA. He is a Professor of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor, MI, USA, currently on leave at Google. He researches computers that make computers. He has coauthored five books, four U.S. patents, and over 200 refereed publications, some of which were honored by the best-paper awards at the Design Automation and Test in Europe Conference (DATE), the International Symposium on Physical Design (ISPD), the International Conference on Computer-Aided Design (ICCAD), and the IEEE Transactions on Computer-Aided Design (TCAD). During the 2011 redesign of the ACM Computing Classification System, he led the effort on the Hardware tree. Prof. Markov is an Association for Computing Machinery (ACM) Distinguished Scientist. He is the recipient of a DAC Fellowship, an ACM SIGDA Outstanding New Faculty award, an NSF CAREER award, an IBM Partnership Award, a Microsoft A. Richard Newton Breakthrough Research Award, and the inaugural IEEE CEDA Early Career Award. He has served on the Executive Board of ACM SIGDA and Editorial Boards of several ACM and IEEE Transactions, Communications of the ACM and IEEE Design & Test.
Dr. Hu won first place at the 2012 International Conference on Computer-Aided Design (ICCAD 2012) Routability-driven Placement Contest, and has been the organizer of the annual TAU Workshop Timing Contest in 2014 and 2015, and the Co-Chair for the ICCAD 2014 and 2015 Timing-driven Placement Contests. Myung-Chul Kim received the B.S. degree in electronic and electrical engineering from the Pohang University of Science and Technology (POSTECH), Gyeongsangbuk-do, Korea, in 2006 and the M.S. and Ph.D. degrees in electrical engineering from the University of Michigan, Ann Arbor, MI, USA, in 2009 and 2012, respectively. He is with IBM Corporation, Austin, TX, USA. His research interests include VLSI physical-design automation with emphasis on placement, clocking, gate sizing, and timing analysis. Dr. Kim is the recipient of the IEEE/ACM William J. McCalla Best Paper Award at the 2010 International Conference on Computer-Aided Design (ICCAD 2010) for his placement work. During his Ph.D., he won the ISPD 2010 clock-network synthesis contest and ICCAD 2012 Routability-driven Placement Contest. He has served as a Topic Chair for physical design problems at ICCAD contests from 2013 through 2015, and he has been a Chair of ACM/SIGDA CADathlon programming contest since 2014.
Jin Hu received the B.S. degrees in electrical and computer engineering from Northwestern University, Evanston, IL, USA, in 2006 and the M.S.E. and Ph.D. degrees from the University of Michigan, Ann Arbor, MI, USA, in 2008 and 2012, respectively. She is currently with IBM Corporation, Hopewell Junction, NY, USA, in the EDA Department working on timing modeling and statistical analysis. Her research interests include physical design, particularly in global routing and routability-driven placement, as well as timing analysis and modeling.
Vol. 103, No. 11, November 2015 | Proceedings of the IEEE
2003