Proud repeatedly solves sparse linear equations. A resistive network analogy of the placement problem, and convexity of the objective function are key concepts ...
PROUD: A SEA-OF-GATES PLACEMENT ALGORITHM
REN-SONG TSAY ERNEST S. KUH CHI-PING HSU* University of California, Berkeley The authors propose a n efficient method for placing modules in large and highly complex sea-of-gates chips that include preplaced 1 / 0 pads and macrocells. Proud repeatedly solves sparse linear equations. A resistive network analogy of the placement problem, and convexity of the objective function are key concepts in this algorithm. For a triple-metal-layer, 100.000-gate seaof-gate design with 26,000 instances, the constructive phase took 50 minutes on a VAX 8650 and yielded excellent results for total wire length.
A
s ASIC technology becomes more popular and the gate count on a chip continues to increase, we need to readdress the quality and speed of automatic layout algorithms. Module placement is especially crucial to'the final outcome of the design. For a sea-of-gates chip with 100,000+ gates, random algorithms, such as simulated annealing, are not the answer because they could take a prohibitive amount of time in reaching an acceptable solution. We propose a hierarchical placement method that exploits the sparsity of connectivity information inherent in the placement specification. We use the quadratic placement formulation first proposed by except that we do not find the eigenvalues and eigenvectors of a large matrix. Our method takes the I/O pad specification as input and solves successively sparse linear equations. The idea is based on the concept of resistive network opt i m i ~ a t i o nbut , ~ it bypasses the slot constraints and uses a simpler partitioning scheme. Another unique feature of our approach is that in the hierarchical partitioning, we use the BGS (block Gauss-Seidel) iteration to resolve the placement interactions among different blocks after partitioning. l
GLOBAL PLACEMENT We solve the global placement problem by assuming that all modules are point modules and all nets are two-pin nets. Thus, we preprocess multipin nets and replace them with two-pin nets. Let cij be the connectivity between module i and modulej. cij could be the number of nets connecting the two modules, for example. We introduce a symmetric connectivity matrix C=[cij] with ciL=O. We use the sum of the squared wire lengths as the objective funct i ~ n .Then ~ , ~with n modules, the objective function is n
L(x,y)= 1/2
cy
2
[(xi-.q) + ( ~ ~ i - y i )=~ xTBx ] + yTBy
(11
ij=1
*Now with Hughes Aircraft Co.
44
where (Xi,yi) represents the coordinates of module i and B is a modified connectivity matrix defined as 0740-7475/88/ 1200-44$1.0001988 IEEE
IEEE DESIGN & TEST OF COMPUTERS
B=D-C
(2)
where D is a diagonal matrix with
(3) where x and y are n-vectors that specify the coordinates of the n modules. The usual placement problem is then to minimize L subject to the slot constraints: all point modules are placed on an evenly spaced, two-dimensional grid. The following discussion summarizes the background material as well as our method for placement. Hall2 treats the linear placement problem first, considering only xTBx in the objective function L.He uses a second-order slot constraint, xTx=1, which leads to the Lagrangian multiplier formulation with L' = X ~ B + X hXTx
Instead of solving the optimization problem directly, we solve a set of linear equations by a generalized Gauss-Seidel method, called successive over-relaxation, or SOR.
(4)
The solution amounts to finding the least nonzero eigenvalue and its associated eigenvector of B. Since the method takes into account only the second order constraint, the solution does not satisfy the slot condition, and is thus only a n approximation of the final placement. Hall gets an approximate solution to the two-dimensional placement problem by taking the y-coordinates from the eigenvector associated with the second smallest nonzero eigenvalue. One significant difficulty with this approach is that calculating the eigenvalues and eigenvectors of a large matrix-even for a sparse matrix-is not simple, and the solution gives only a rough initial placement. Other methods are needed as a follow-up. Cheng and Kuh3 derive higher order constraints to force modules onto slots, but the nonlinear programming problem s o formulated is difficult to solve, if not impossible. By using only a first-order linear constraint, the authors introduce a resistive network model to obtain the initial solution explicitly from the wellknown Kuhn-Tucker condition^.^ They follow that model with relaxation and successive partitioning to obtain the final solution. Our method also takes advantage of the resistive network concept, but we ignore all constraints. Instead of solving the optimization problem directly, we solve a set of linear equations by a generalized Gauss-Seidel method, called successive over-relaxation, or S O R . ~ Consider the electric network analogy of the lacement problem. We can interpret the objective function x?p Bx as the power dissipation of a n n-node linear resistive network, where x corresponds to an n-vector whose components represent the voltages at the nodes. For the rest of this article, x designates either the coordinate or the voltage vector. The modified connectivity matrix B is exactly the indefinite admittance matrix of the n-node resistive network with -by equal to the conductance between node i DECEMBER 1988
' 7
45
SEA-OF-GATESPLACEMENT W e can model the I10 pads as fixed voltage sources applied to the network. The coordinates of the movable modules are the node voltages t o be determined.
and nodej. The placement problem is then equivalent to that of choosing the node voltages of the n-node network for which the power dissipation is a minimum. In electric network theory, minimum power dissipation is implied by the two Kirchhoff Laws. Thus, we can reformulate our linear placement problem in terms of a linear resistive-network problem as long as we include the 1 / 0 pad specifications. We can then model the 1 / 0 pads as fxed voltage sources applied to the network. The coordinates of the movable modules are the node voltages to be determined. Figure 1 shows a network in which nodes 1 to rn represent movable modules and nodes rn+l to n represent fixed modules with voltages specified by 1 / 0 pads. We can then write the n-port network equation a s B i i x i + B 1 2 ~ 2= 0
(5)
B2ixi + B 2 2 ~ 2= 12
(6)
where B 1 l , B u T =B21, and B22 are the familiar submatrices of the node admittance matrix of the network. The voltage vector, x i , is of dimension rn and is to be determined, where rn is the number of movable modules. The vector x2 is the voltage source, and the vector i 2 is the current flowing into n-rn terminals from the sources in Figure 1. Equation 5 is the key equation, which we can write as
where
We have thus converted the optimization problem into a linear algebraic problem. Similarly, for the other dimension, we have Ay1 = by with by = -Bi2y2. Since A is sparse, usually with only 0.1%to 1% nonzero elements, we can use familiar sparse-matrix techniques to solve for x i . We have chosen the SORmethod to solve the sparse linear equations. To summarize this method, let mtl
Linear
A = A(L+I+U)
(9)
where A is a diagonal, positive definite matrix, L is a lower triangular matrix; and U is an upper triangular matrix. We solve the vector x 1 iteratively using the following recursive formula:
resistive network
where -
figure 1 . An n-terminal linear, passive-
resistive network whoseJrst rn nodes are mating. The remaining n-rn nodes are connected to voltage sources.
46
and
B E E DESIGN & TEST OF COMPUTERS
The parameter, w ,to be chosen is in the range of zero to two. Usually, 1.7 or 1.8 is a robust number for w. For the special case w =1 , the SOR method reduces to the familiar Gauss-Seidel method. The advantage of the SOR method is that it preserves the sparsity in A in the iteration. Also since A is real, symmetric, positive definite, and diagonally dominant, convergence is guaranteed. The running time in each iteration is linear. The 1 / 0 pads on the boundary of a chip define the coordinates of the fixed modules in the x-direction by projecting all pads to the x-axis as Figure 2 shows. The one-dimensional problem is a linear placement problem. The figure also shows the futed modules for the y-direction linear-placement problem. The solutions of these two problems give the coordinates of the movable modules in two dimensions, which constitute our global placement. The result of global placement is the optimal solution in terms of our original objective function. The reason is that ux,y) is a convex quadratic function, so there is a unique global minimum. However, we have assumed that all modules are point modules and have ignored the slot constraint. In practice, modules occupy a finite nonzero area, and their shapes vary. Although the slot constraint is not pertinent to our global placement problem, replacing the point modules with real modules may cause module overlaps at this step.
I I
I I
I I
I I
I I
I I
r h r n r h r n r h r h
,x
Figure 2. Fived I / O pads are projected to the x-axis and y-axis, respectively, in defming the two linear placement problems.
PARTITIONING A N D ITERATION To resolve the problem of module overlapping, we use an efficient partitioning method with iterations. After partitioning, each block is separated into two. To handle placement interactions among blocks, we use the efficient block Gauss-Seidel, or BGS, iteration method to obtain an optimal solution. The key feature is that we again solve linear sparse equations, repeating the process in a top-down hierarchy to obtain the final solution. As Figure 3 shows,we introduce avertical partition line to divide the modules into two sets. We determine the location of the partition line as follows: We sort the modules from left to right and add up the module areas until we get roughly half the total module area. We then make the partition. If the partition line happens to coincide with the center line (cut line), the partition has been done. If not, we move all the modules that lie between the center line and the partition line to the other side of the center line. At the end of this process, the center line divides the modules into two sets. Each set occupies half the area of the chip required by all the modules. Thus, initial placement is modified in both halves. To obtain a new global placement after partitioning, we introduce a simple heuristic. Assume, without loss of generality, that the partition line is to the right of the center, as shown in Figure 3 . All modules to the right of the partition line are projected horizontally on the center line to form part of the fvred modules. Only modules that lie to the left of the partition line are in the set of movable modules. We then solve the global placement problem in DECEMBER 1988
Partition line
Center line
’
Figure 3. Projecting modules from the right half to the center line to determine the left-halfplacement.
47
C E A - O F - G A T E SPLACEMENT Our approach works because w e perform an important global
Optimization step-the block Gauss-Seidel iteration-at each level of hierarchy.
the left half by the SOR method and determine a temporary topbottom partitioning in that half. We next project the movable modules on the left half horizontally to the center line a s part of fixed modules to calculate the global placement on the right half. We also determine a temporary top-bottom partitioning. We repeat this process two or more times to fix the partition on each half. We repeat the same procedure at the next hierarchical level and continue until each block contains one module to be placed at the only legal location in the block. By legal location, we mean the location that satisfies the slot constraints. Since our partitioning scheme takes into account the available area, we will have no overlap. For row-structured designs, we perform a pure linear placement based on sorting when a block contains only one row. This methods works well for a number of reasons. The first is that problems of partitioning and placement are similar in mathematical structure.6 In particular, bipartitioning is equivalent to linear placement. Thus, in our approach, the result of initial linear placement gives a near optimum min-cut partitioning. Other placement techniques based on min-cut partitioning are successful for the same reason.778Because there is minimum crossing between the two sides of a cut, we reduce the global routing requirements between them. This design strategy is obviously beneficial for any style of layout. Another reason that the algorithm works well follows from the concept of convexity. We can easily prove from the property of the modified connectivity matrix B that all movable modules are restricted to inside the convex hull of the fixed modules on the boundary. This is also obvious from the electric network analogy. For passive-resistive networks, the voltage of any node cannot exceed the maximum source voltage in the circuit. Thus, we are assured that the optimum location of the point modules in each hierarchy obtained from the solution of linear equations remains within the given region and the min-cut results will not be perturbed. Finally, our approach works because we perform an important global optimization step-the BGS iteration-at each level of hierarchy. Suppose that after a vertical cut, we partition the movable modules into the left half and right half of the center line according to the global placement in x direction. In the next hierarchy, we make a horizontal cut in each half in y direction. We use the BGS iteration to solve a global solution of yi . Let the coordinates of the movable modules be depicted with subindices according to the two halves. Thus we partition yi into
An extension of Equation 5 becomes
where 48 1
-7
IEEE DESIGN 8t TEST OF COMPUTERS _____
-
is the contribution from the futed modules or the given 1 / 0 pads
as in Equation 8. To get the global solution of yi after the modules are partitioned, we mod@ the two previous equations and solve
and
To refine placement quality, w e use actual pin locations t o precisely evaluate the specified objective function, which in our case is half -perimeter wire length.
where yla* and y i b * are the perturbed solution from y l a and y l b because of the partitioning process or linear placement. This iterative improvement is the same as the BGS method, which is why we call it a BGS iteration. Two to five BGS iterations give very good results at each level of hierarchy. The BGS scheme maintains the global optimization aspect of the unconstrained solution phase and is considered a crucial step in achieving high quality placement. The most important effect is that it takes into account the placement interaction among the blocks created by partitioning (Equations 15 and 16). The algorithm we have described is called Proud-2 to sign& global placement followed by successive two-way partitioning. We have also implemented a four -way partitioning scheme, called Proud-4, but we do not discuss it here. The improved algorithm discussed below is called Proud-I.
IMPROVING PLACEMENT By considering actual pin position and local perturbation (such as module rotation, 1 / 0 pad position adjustment, module swap or insertion),we can further improve the quality of placement. So far, we have used a point-module model to represent the locations of all pins on a module. The model represents all the pin locations by the same point, usually the center or the lower left corner of the module, and hence some detail is lost. To refine placement quality, we use actual pin locations to precisely evaluate the specified objective function, which in this discussion is half-perimeter wire length. Depending on the design style, some circuits allow module rotation, but some may not. Currently, we provide an option to do flipping along the x-axis (map the ''G" shape to the 2''shape). This operation usually takes 0.6% to 1.8% of the total runtime, but accounts for 4.2% to 23.1% of the total improvement. Also, in some designs, the positions of I/O pads are not fixed. For these cases, we simply do painvise interchange to adjust the pad positions. The runtime for this operation is 1.2% to 4.4% of the total runtime, but contributes 6.2% to 11.6% of the total improvement. We swap and insert modules by scanning through every module and picking the best one for swapping or the best position for insertion in a small, prespecified square window around the mod"
DECEMBER 1988
1
-
7
49
~-
SEA-OF-GATESPLACEMENT The program is written in C and can run on a VAX machine under Unix or VMS or on an Apollo machine under Aegis.
ule. For all experiments reported here, we arbitrarily set the window to four-bucket high and forty-bucket wide. (We discuss later how to form a bucket.) We swap or insert modules only when such action will reduce cost. In our current implementation, we maintain a legal placement result at all times. When the destination space for a module is not big enough, we shift the modules on one side of the gap until there is enough room. If the row would be too long, we give up trying to move the module to that space. To speed up the search for candidates around the scanned module, we partition the chip into a two-dimensional array of buckets and attach each module to an appropriate bucket. Each bucket is a rectangle. The width and height of each bucket is the minimum module height and the minimum module width. Whenever the lower left corner of a module is inside the bucket or on top of the bottom or left edge of the bucket, the module is attached to that bucket. We can easily show that each bucket on the array contains at most one module with such bucket size. Thus we can update information and do our search for candidate modules in constant time. For row-structured designs, we set the height of the bucket to the minimum row separation. The memory required for the two-dimensional bucket array is roughly linear, if we assume the modules are approximately the same size. Swapping and insertion are responsible for most of the running time but they contribute to the major improvement. In our experiments, module swapping took 53.3% to 66.7% of the total runtime and contributes 4 1.1% to 60.1% of the improvement. In comparison, module insertion took 31.0% to 40.6% of the total runtime, for 23.9% to 28.9% of the total improvement.
COMPLEXITY The core of Proud-2 has only 1500 lines, the detailed placement part in Proud-I has 1700 lines, and the input and output take up another 1500 lines. The program is written in C and can run on a VAX machine under Unix or VMS or on an Apollo machine under Aegis. Figure 4 lists the pseudocode. We assume that the total number of pins is linearly proportional to the total number of modules. The key consideration is how we store the B matrix. Since the netlist specification is usually given by the pin-net connectivity, we need to express the complexity in terms of the total number of pins. Since the number of pins is proportional to the number of modules, the memory space complexity in our sparse matrix formulation is where n is the number of modules. We use O(n)buckets to guide detailed placement, so the total memory complexity of our algorithm is
an),
an).
We assume that the number of iterations of the SOR method is smaller than a constant, which makes global placement and the iterative linear-system solver linear in terms of runtime complexity. The complexity for the sorting algorithm is O(nlogn) at each partitioning step. The hierarchical cut creates a binary tree that can have at most logn levels. The complexity of the detailed place50
1
--m
IEEE DESIGN & TEST OF COMPUTERS
PROUD-PLACEMENT: Input: netlist, cell library, I/O pad location, net weighting, number of hierarchies, number of BGS iterations. Output: placement result.
Algorithm: Assign all modules to one internal block, and assign all I/O pads to one I/O block. Construct the modified connectivity matrix in the sparse form. For each hierarchy
W e tested the algorithm on nine real circuits, ranging in complexity from 1,000 to 26,000 modules.
{
Repeat number of BGS iterations (repeat only once for the first hierarchy) {
For each internal block {
Reconstruct the modified connectivity matrix for this block. Calculate the right hand side of the linear equation according to the boundary condition determined by the projections of modules or I/O pads in other blocks. For global Placement, use the SOR method to solve for relative module positions. Temporarily cut the block into two subblocks according to the cut method for this hierarchy. Assign the modules into separate sub-blocks by sorting.
1 1 Fix the cut of each internal block and permanently associate the modules to the corresponding subblocks, which form a new set of internal blocks for next hierarchy.
1 Do detailed placement improvement.
Figure 4. Pseudocode for the Proud algorithm,
ment depends on the size of the scanning window, since we essentially try all possible swaps or insertions for every bucket in the window. Assume that each test swap or insertion takes a constant time K and that the size of the window is 11 . Z2. Then the complexity of the detailed placement step is K . q l i . 12 . n). In our experiments, we choose fixed window size for all cases. Thus, the runtime complexity for our method is O(nlog2n).However, since K . 11 . 12 is a large number, detailed placement can dominate runtime.
EXPERIMENTm RESULTS We tested the algorithm on nine real circuits. Seven contain 1,000 to 4,000 modules. The other two are more complex, with one having 14,000 modules and the other 26,000 modules. The runtimes in Tables 1 through 5 are for test runs on a VAX 8650 machine (a 6-MIPSmachine). Memory storage is less than linear mainly because of the sparse matrix structure, and runtime is only slightly worse than linear. DECEMBER 1988
51
SEA-OF-GATES PLACEMENT One of the most viable approaches for future VLSI circuit designs is using sea -of -gates technology with a combination of cell sizes on one chip.
Table 1. Results of runnng Prod-2 and Proud-I compared with results of TimberWolf4.2-design hl . No. of N e t s = 1,704
No. of Pins = 4,815
Proud-2 Proud-I
Wire Length (half perimeter) 4,025,290 (1) 3,612,446 (0.897)
Runtime (seconds) 34 (1) 274 (8.06)
Memory (Mbytes) 0.8 1.17
TWOLF4.2
3,758,618 (0.933)
3,279 (96)
N/A
Design
hl Algorithm
No. of Modules = 1.180
Table 2. Results of runnng Proud-2 and Proud-1compared with results of TirnberWolf4.2-design h 7. No. of N e t s = 3.094
No. of Pins = 10,778
Proud-2 Proud-I
Wire Length (half perimeter) 6,312,228 (1) 5,512,771 (0.873)
Runtime (seconds) 325 (1) 1,750 (5.38)
Memory (NfDytes) 1.54 2.67
TWOLF4.2
5,828,984 (0.923)
24,108 (74)
N/A
Design
h7 Algorithm
No. of Modules = 3,816
Table 3. Results of runnng Proud-2 and Proud-l-design h!3 (TimberWolf4.2 could not complete this design). ~
Design
h9 Algorithm Proud-2 Proud-I
No. of Modules = 26.277
Wire Length (half perimeter) 50,028,939 (1) 42,709,617 (0.854)
~
~
No. of N e t s = 2,9151
No. of Pins = 92,530
Runtime (seconds) 3,048 (1) 10,447 (3.43)
Memory (Mbytes) 11.08 21.55
We integrated Proud-2 into the SPARS layout ~ y s t e mand , ~ all the circuits it has placed have been routed successfully. In practice, routability is a better measure of the quality of placement algorithms, but for simplicity, we adopted the commonly used halfperimeter wire length for comparison. Tables 1 through 3 summarize results on three real designs and compare them with TimberWolf4.2.lo For all these tests, we do not allow 1 / 0 repositioning or module rotation. Our results in terms of total wire length are comparable to those of TimberWolf4.2. However, with detailed placement improvement, a s implemented in F’roud-I, we get results superior to those of TimberWolf4.2 whenever comparisons are possible. Timberwolf was unable to finish the design in Table 3 , for example. We also tried two benchmark standard cell examples: Primary1 and Prima@. For Primaryl, the modules are placed in 17 rows in a 5 , 4 2 0 ~ x 4,320~ area. For Primary2, the modules are placed
52
1
--1
IEEE DESIGN & TEST OF COMPUTERS
Table 4. Results of runnng Proud-2 and Proud-I compared with results of other algorithm-design Prirnay 1 [standard cell). No. of Pins = 3,303
Design primary1
No. of Modules = 833
No. of N e t s = 904
Algorithm
Wire Length (half perimeter)
Runtime (seconds)
Proud-2 Proud-I Others
1,017,824 (1) 826,096 (0.812)
16.9 (1) 171 (10.1)
0.90 0.92
t 1,070,000
N /A
N/A
Memory (Mbytes)
Table 5. Results of runnng Proud-2 and Proud-I compared with results of other algorithm-design primary2 (standard cell). No. of N e t s = 3,029
No. of Pins = 12,014
Wire Length (half perimeter)
Runtime (seconds)
Memory (Mbytes)
Proud-2 Proud-I
5,3 19,366 (1) 4,610,701 (0.867)
126 (1) 1,012 (8.03)
2.83 2.89
Others
2 4,960,000
N/A
N/A
Design Primary2
No. of Modules = 3,014
Algorithm
W e can apply our two-dimensional placement algorithm to designs w i t h prep 1aced macro ce 1Is, but w e need t o alter the fixed-pin projection, partitioning, and cut-line positioning.
in 29 rows in a 9 , 2 4 0 ~ x 9 , 0 8 0 ~area. We picked the row numbers and chip areas from the reported results of the 1987 Physical Design Workshop for fair comparison. Tables 4 and 5 show the placement results. Again, we note that Proud-2 does not allow module rotation and does not do 1 / 0 assignment. All 1 / 0 positions are given. We do allow Proud-I, however, to perform 1 / 0 assignments and module flipping along the x-axis for these two benchmark examples. The results we obtained are better than those from all others.
EXTENSIONS One of the most viable approaches for future VLSI circuit designs is using sea-of-gates technology with a combination of cell sizes on one chip, such a s the small basic NANDs and NORs together with the macrocell m s , and P u s . Memories or PLA controllers have a regular structure, which we can easily compose in compact form as macrocells. We can apply our two-dimensional placement algorithm to designs with preplaced macrocells. However, we must make three minor modifications: we need to alter the fixed-pin projection, partitioning, and cut-line positioning.
Fixed-pin projection: Since the sizes of macrocells and basic cells are very different, we have to use the actual pin locations instead of the point-module representation of the fured modules. The fixed modules include both 1 / 0 pads and preplaced macrocells. The pins on the fixed modules are called fixed pins. In obtaining a global placement result from Equation 7, we project the fixed pins, not the fixed modules, to the x-axis and y-axis, respecDECEMBER 1988
I
-7-
53
SEA-OF-GATES PLACEMENT I
i II
Partition line
I
I
I I
0
0
1 I
I
0 1 1
0
I Cut line I I I I
El Macro
I I I I I I
Figure 5.Proportionally assign the movable modules to the two halves according to the ratio of available space.
Figure 6. The cut-line positioning depends on the distance of the edge line to the center line. 54
tively, to define the two one-dimensional problems. We can do this easily because our partitioning scheme depends only on a sorted order of the modules.
Partitioning: We are better off placing the cut line along the edge of the macrocells, so the cut line may not be at the center of the chip. Therefore, depending on the ratio of the available space, we assign the movable modules proportionally to the left and right sectors defined by the cut line (Figure 5). Cut-line positioning: As we just stated, we want to place the cut line along the edge of the macrocells, but we also want a low runtime complexity. Therefore, we need to minimize the depth of the binary tree created by the two-way partitioning. This means we want the cut line close to the center. In determining where to cut, then, we use the principle that the widths (or heights) of the two sectors determined by the cut line must be a s close a s possible. We form edge lines by extending the horizontal and vertical boundaries of the macrocells. A strip is defined a s the region between adjacent cut lines specified by the previous partition (Figure 6). Additionally, we treat chip boundaries as previous cut lines to determine strips. All the blocks in the same strip are partitioned by the same cut line. Suppose we want a vertical cut line to be specified on a vertical strip. Let the width of this strip be normalized to 1. Assume that the nearest edge line to the center line divides the strip in two. The larger substrip has width x, while the smaller one has width 1 - x. In dividing the strip, we make sure that one half the width of the larger substrip is not greater than the width of the smaller one. Thus the critical value occurs when x/2 = 1 - x, or when x = 2/3. In other words, if the width of the larger substrip is smaller than 2 / 3 of the width of the strip then we place the cut line along the nearest edge line. Otherwise, we place the cut line along the center line. The worst-case depth of the binary tree is logn/(log3 - log2). We tried placement for a practical design from industry with 20 preplaced macrocells, 1,735 movable modules, and 57 1 / 0 pads (fxed modules). We achieved the placement in Figure 7 in 50 seconds on a VAX 8650. In addition to accommodating circuit placement for designs with preplaced macrocells, we can extend the algorithm to adopt an eigenvector approach to 1 / 0 pad assignment. When optimizing the objective function in Equation 1, we take advantage of the preplaced 1 / 0 pads and simplify the placement problem by solving linear equations. Since the optimum, unconstrained, movable module locations depend on preplaced 1 / 0 locations, Equation 7 suggests that if we optimize 1 / 0 pad locations we will get better placement results. Before we place internal movable modules, we can extract an abstract representation of the interconnection of all 1 / 0 pads to serve as a guide in assigning 1 / 0 pads. The eigenvectors of the modified connectivity matrix of the 1 / 0 pads play an important role in this assignment. If we substitute Equation 7 into Equation 5, we find that LEEE DESIGN & TEST OF COMPUTERS
where B* (B*= B22 - B2iBii-lBi2)is the modified connectivity matrix of 1 / 0 pads. Then the problem of optimizing 1 / 0 pad locations amounts to solving another quadratic programming problem. This quadratic programming problem is in a special form that we can solve by using a n eigenvector method. We use a simple circuit such a s that in Figure 8 to illustrate this method. The circuit has four 1 / 0 pads and four internal modules. Each segment represents a net. The layout is obviously optimum if the modules are to be placed on two-by-two grids. The two eigenvectors corresponding to the two least nonzero eigenvalues of B* are ~1 = ~2 =
[0.5,-0.5,-0.5,0.51 [0.5,0.5,-0.5,-0.51
which yields a perfect 1 / 0 pad assignment (Figure 8b). If the eigenvector solutions reside exactly on the legal 1 / 0 pad locations, then the result is optimum. However, in practice, we have to map the eigenvector solutions to the chip boundary where the 1 / 0 pads are located. One mapping scheme is the minimum-squared-error assignment. This scheme minimizes the difference of the ideal eigenvector solution and the assigned pad location. If the 1 / 0 pads are uniformly distributed on a square chip boundary, we can approximate the square a s a circle and reduce optimum mapping to a circular ordering problem. We assign a legal location to a pad if both the location and the corresponding eigenvector solution of the pad have the same circular order. The ordering is a n qnlogn) problem, which is much more efficient than the minimumsquared-error assignment. To attain B*,we multiply the runtime of a single hierarchical step of Proud-2 by the number of 1 / 0 pads. A pairwise interchange of 1 / 0 pads is more efficient and effective in practice, however.
Figure 7 .A designfrom industry with 20 macrocells. 101
IO0
101
0
'e have presented a novel approach to module placement. The algorithm gives better results than the latest available simulated annealing approach.Moreover, the runtime of this algorithm is an order of magnitude faster than simulated annealing. Our approach is especially suitable for a sea-of-gates design. The method takes advantage of the inherent sparsity of the placement problem and depends on solving linear sparse equations repeatedly. In the future, we hope to extend this method in the several ways. We think a good routing area estimation can improve routability
W
DECEMBER 1988
I
I
@)
-- 0.5
IO0
I
I
-0.5
0.5
-X
0 102
Figure 8. A simple circuit with four I / O pads and four internal modules (a)and the coordinates calculated from the two eigenvectors corresponding to the least nonzero eigenvalues @).
55
SEA-OF-GATESPLACEMENT
Ren-Song Tsay is with the Electronics Research Laboratory at the University of California, Berkeley, where he is also working toward a PhD in electrical engineering and computer science. His research interests include VLSI circuit layout algorithms, optimization techniques, matrix computation, and parallel processing. Previously, Tsay was with Integrated CMOS Systems and Hughes Aircraft Co. He holds a BS from National Taiwan University and an MS from the University of California at Santa Barbara.
dramatically. We are now studying an algorithm that intelligently distributes the modules according to local routing requirements. We could also modify our algorithm to be a floor planner, since it runs very fast. Finally, we can use computation techniques to make the algorithm even faster. The SOR iterative method for linear systems is ideal for parallel processing. We can use either the block Gauss-Seidel or the block SOR method. We can also do detailed placement in parallel, since all the computation is in a local window area. @3
ACKNOWLEDGMENTS We thank Keiko Chow, Scott Evans, Chih Lee, and J u i Tang for helping with the implementation of Proud-2. Discussions with Michael A.B. Jackson and Joe Higgins were also useful. Research for this article is supported in part by the National Science Foundation under grant MIP-850690 1 by the Joint Services Electronics Program under contract F49620-87-C-004 1, and by Hughes Aircraft Co. ~
Ernest S. Knh is a professor of electrical engineering at the University of California, Berkeley. Previously, he was chair of the Electrical Engineering and Computer Sciences Department there, as well as dean of the College of Engineering. Kuh holds a Bs from the University of Michigan, a n S M from the Massachusetts Institute of Technology, and a PhD from Stanford University-all in electrical engineering. He is a member of the National Academy of Engineering and a fellow of the IEEE.
REFERENCES 1. C-P Hsu et al., “An Effective Hierarchical Approach to High Complexity Circuit Layout,” Roc. CustomIC Conf.. 1987, pp. 614-617. 2. K. Hall, “An r-Dimensional Quadratic Placement Algorithm,” Management Science,Vol. 17, No. 3, Nov. 1970, pp. 219-229. 3. C-K. Cheng and E. Kuh, “Module Placenient Based on Resistive Network Optimization.” IEEE Trans. Computer-Aided Design, Vol. CAD-3, No. 3, July 1984, pp. 218-225. 4. E. Polak, Computational Methods in Optimization, Academic Press, New York, 1971, pp. 7-12. 5. G. Golub and C. Vanloan, Mat* Baltimore, 1983, pp. 353-372.
A s CAD actibl t i e:s at Hughes Microelectronics Center, where his interests are behavioral synthesis, logic and layout synthesis, and mixed analog/digital design automation. At Hughes, he is responsible for developing automatic gate-array layout systems, standard cell layout systems, highcomplexity sea-of-gates layout systems, floorplanning/building-block layout systems, and a high-density multichip interconnection design system. Hsu holds a BS in electrical engineering from National Taiwan University and a PhD in electrical engineering and computer science from the University of California, Berkeley.
56
T
T
Computations, Johns Hopkins Univ. Press,
6. R-S. Tsay and E. Kuh, “A Unified Approach to Circuit Partitioning and Placement,”Proc. Princeton Con$ Information Sciences and Systems, 1986, pp. 155160.
7. M. Breuer, “Min-Cut Placement,” J. Design Automation and Fault-Tolerant Computing, Vol. 1. No. 4, Oct. 1977, pp. 343-362. 8. U. Lauther, “A Min-Cut Placement Algorithm for General Cell Assemblies Based on a Graph Representation,” Proc. Design Automation Conf., 1979, pp. 1-10,
9. C-P Hsu et al., “Automated Layout of Channelless Gate Array,” Roc. IEEE CICC, 1986,pp. 281-284. 10. C. Sechen and K-W. Lee, “An Improved Simulated Annealing Algorithm for Row-Based Placement,” Proc. Int’l Conf. Computer-Aided Design, 1987, pp. 478-48 1.
E E E DESIGN & TEST OF COMPUTERS