Document not found! Please try again

Hierarchical Partitioning

4 downloads 0 Views 143KB Size Report
[BuHe89] Bui T.; Heigham C.; Jones C.; Leighton T.;. "Improving the Performance of the Kernighan-Lin and Simulated Annealing Graph Bisection. Algorithms" ...
Hierarchical Partitioning Dirk Behrens

Klaus Harbich

Erich Barke

Institute of Microelectronic Systems Department of Electrical Engineering University of Hanover, D-30167 Hanover, Germany E-mail: {behrens, harbich, barke}@ims.uni-hannover.de Abstract Partitioning of digital circuits has become a key problem area during the last five years. Benefits from new technologies like Multi-Chip-Modules or logic emulation strongly depend on partitioning results. Most published approaches are based on abstract graph models constructed from flat netlists, which consider only connectivity information. The approach presented in this paper uses information on design hierarchy in order to improve partitioning results and reduce problem complexity. Designs up to 150k gates have been successfully partitioned by descending and ascending the hierarchy. Compared to a standard k-way iterative improvement partitioning approach results are improved by up to 65% and runtimes are decreased by up to 99%.

1 Introduction With continuously rising design complexity partitioning has become a key problem in circuit and system design. Depending on the specific task, partitioning objectives vary: • A design might be split into a predefined number of parts with minimum interconnections and balanced part sizes [San89], [RiSc95]. • A fixed interconnect architecture can be given [ObGl95]. For new technologies such as MCM or emulation the number of parts should be minimized with a given upper limit for part sizes and interconnections, e.g. [ChLi95]. • During prototyping or implementing a design with multiple heterogeneous standard devices like CPLDs or FPGAs costs have to be minimized by choosing a minimum cost device distribution, e.g. [KuBr94]. • Delay optimization, e.g. [BrSa94], avoiding cycles over more than one part, e.g. [CoLi94] or power minimization, e.g. [KhMa95]. ICCAD ’96

1063-6757/96 $5.00  1996 IEEE

The approach presented in this paper particularly addresses the second objective: Minimization of device count with given upper limits for device size and interconnections.

2 Previous Approaches Partitioning approaches can be classified into four different groups: Iterative improvement algorithms, also known as refinement or top-down algorithms, utilize a (e.g. randomly generated) start partition and optimize an objective function by interchanging elements [KeLi69] or moving elements [FiMa82]. Based on Ford & Fulkersons max-flow min-cut theorem [FoFu62] some algorithms use repeated max-flow min-cut [YaWo94], single [Pla90] or multi commodity [LeRa88] flow techniques for partitioning. Recently, algorithms based on quadratic programming techniques, also referred to as spectral or eigenvalue based methods, have been presented showing good partitioning results [RiDo94], [HaKa91]. In contrast to iterative improvement algorithms many approaches based on clustering, also known as bottomup or constructive algorithms, have been proposed. These try to recursively collapse elements together controlled by an appropriate objective function. Usually, these clustering algorithms are used as preprocessing steps for two way iterative improvement algorithms to improve partitioning results as well as runtime due to less complexity. Some approaches optimize an already calculated partition through replication, e.g. [HwGa95], [KrNe91], retiming [LiSh93] or encoding signals on interconnects [BaSa93] in order to enhance cut size or timing demands. The hierarchical partitioning approach presented in this paper is also a kind of clustering approach but without any following iterative improvement or other algorithm. It is specifically aimed at FPGA-based logic emulation. Therefore, we use the number of needed FPGAs (devices) as an objective function with a given

upper limit for part sizes (gatecount) gclimit and interconnections (pincount) pclimit . Like [ChLi95] all devices are supposed to have the same size. According to Xilinx 3090 FPGA resources, we use default values of 5000 for gclimit and 144 for pclimit contrary to 2700 for gclimit and 184 for pclimit used in [ChLi95]. In the next two chapters we first analyze known clustering approaches and then show that design hierarchy is useful for partitioning. In Chapter 3 hierarchical structures are classified with respect to the partitioning scheme introduced in Chapter 5. Experimental results from different industrial circuits using our new hierarchical approach are presented in Chapter 6.

2.1 Clustering Algorithms Generally speaking, it is not possible to differentiate between k-way partitioning (k >> 2) and clustering approaches. Usage of clustering algorithms reduces the complexity of the problem. This results in tremendous savings of runtime as well as memory requirements, while increasing result quality simultaneously. Previous work can be grouped into approaches using local or global connectivity information. Simple pairwise local clustering has been presented based on conjunctivity and disjunctivity [ScUl72], average degree [BuHe83], kl-edge connectivity [GaPr90] and time cycles [ShKu93]. Other approaches use cluster radius and sparsity [AwPe90], a compaction heuristic [BuHe89], Rent's Rule [DiHo93], a recursive ratio cut approach [WeCh90], labeling with delay information [MuBr91], random walks [HaKa92], compaction of moved nodes during an iterative improvement algorithm [Saab93], recursive collapsing of small cliques in a graph [CoSm93] and different vertex ordering [AlKa94]. All these approaches use only topological information about a design except for [ShKu93] and [MuBr91] who incorporate timing information.

2.2 Design Driven Clustering The restriction of most partitioning and clustering algorithms to topological information is caused by the standard partitioning flow where usually first a (hyper)graph is created from a flat netlist. Due to this step all design information like "celltypes" of nodes or clocked nets and other useful information is lost. This results in suboptimal partitioning results. However, even more design information is useful for partitioning: Design hierarchy, sequential and combinational parts, signal directions, feedback loops, busses, large nets (power, global clock), combinational cones, critical paths and, most important, high level structures like

registers or counters. With the approach presented in this paper we use hierarchy information to improve partitioning results. This is accomplished by building up a cluster structure from hierarchy and then merging clusters to feasible parts. We use the term "cluster" for a set of at least one element. The term "part" will be used for an element of a partition constructed from different clusters. A physical FPGA with limited size and pincount will be called "device". Each part will be mapped onto one device.

3 Design Hierarchy Design hierarchy is one way of handling design complexity. Hierarchy has been used by designers to split a system into feasible blocks (Top-Down Design) and to build larger blocks from smaller ones (Bottom-Up Design). Today, only physical design automation tools for layout verification or layout extraction use hierarchy information efficiently to reduce complexity and runtime.

(a)

(c)

(b)

(d)

Figure 1: Circuit (a), functional (b), spatial (c) and combined (d) decomposition [Rubi87] Design hierarchy is always a result of combined topdown decomposition and bottom-up assembly, and it strongly depends on decisions made by the designer. There are two main approaches to hierarchy: Functional decomposition and spatial decomposition [Rubi87]. In functional decomposition the module hierarchy is dominated by data flow or control signals, thus, it is always connectivity driven (Figure 1b). In spatial decomposition hierarchy is used to reflect the distance of modules in the schematic (what can be seen on one screen) or in layout (Figure 1c). Usually, functional and spatial methods are combined (Figure 1d), but designers' focus is mainly on interconnections. Hierarchy arises by deviding modules into subblocks. The number of blocks is limited by the designers' power of imagination. Empirical experiments show that an

ordinary person can handle up to 5 - 7 tasks simultaneously [Rubi87]. Due to this "branching factor", hierarchy has always a small amount of blocks per level and therefore low complexity. Optimization of interconnections and low complexity at each hierarchy level are the main reasons why we believe in our hierarchical partitioning approach.

Size: 80000 Size: 65000

Pins: 400

S ize: 2000

Pins : 40

Size: 2000

Pins: 40

S ize: 2000

Pins : 40

Size: 2000

Pins: 40

4 Feasible Elements One way to use hierarchical information, which can be easily incorporated into known partitioning approaches, is to traverse the design hierarchy for additional clustering information. All feasible elements H feasible are used as predefined clusters (Figure 3). This will be called a "cut hierarchy" (ch) approach.

Pins: 160

Size: 2000

H1 H11 H111A AAAA AAAAAA AAAA AAAAA AAAAA

Pins: 40

H2 H21

H22

H12

H3 H31 H311

AAAAA AAAA AAAAAA AAAAA AAAAA

H312 AAAA A A AAAA AAAAAA AAAAA AAAA

Pins: 120

Size: 2000 S ize: 2000

Size: 2000

Pins : 40

Size: 2000

Pins : 40

S ize: 2000

Pins: 40

Part 1

Pi ns : 40

Pins : 40

H322 AAAA A A AAAA AAAAAA AAAAA AAAA

H4 H41 AAA AAAA AAAAAAAA AAAAAAA AAAAAAAAAAA AAAAAAAAH42 AAA AAAA a AAAA b AAA AAAA AAA AAAA H43 AAA AAAA AAAAAAAA AAAAAAA AAAAAAAAAAA

AAA AAA Feasible AAA

Partition Size: 6600

H32 H321

Part 2

Unfeasible Leafcell

Figure 3: Feasible and unfeasible hierarchical elements

Size: 2000

Pi ns: 40

Hierarchical instance

Leafcell

Figure 2: Categories of hierarchical elements

3.1 Classification of Hierarchical Elements Design elements at each hierarchy level usually consist of instances of hierarchical cells H and instances of library cells (leafcells) L (Fig. 2). Regarding size gc( H ) and pincount pca H f of instances we can distinguish feasible H feasible , almost feasible H nearfeasible and unfeasible Hunfeasible elements in the following manner: Feasible elements: H feasible

gc( H feasible ) ≤ gclimit , pc( H feasible ) ≤ pclimit

Almost feasible elements:

R | gc ( H nearfeasible ) = gclimit + ε , pc( H nearfeasible ) ≤ pclimit U | H nearfeasible S gc ( H ) ≤ gc , pc( H ) = pc +εV | | nearfeasible limit nearfeasible limit T W Unfeasible elements: Hunfeasible

gc( Hunfeasible ) > gclimit , pc( Hunfeasible ) > pclimit

Additionally, elements can be classified by the number of contained hierarchical n H and/or leafcells n L into pure hierarchical H HHH , almost hierarchical H HHL , almost flat H HLL as well as flat H LLL elements. In Figure 2 examples of all four types are given.

In preliminary examinations (Table 1) two industrial designs have been partitioned as flat netlists with a standard iterative improvement algorithm (FM) and by using cut hierarchy (ch + FM). Our results show that cutting hierarchical elements reduces cutsize in some cases drastically, while in other cases there are just poor benefits. Design ind1 ind2

cutsize RatioCut flat+FM ch+FM flat+FM ch+FM 141 47 8.75E-05 1.72E-04 559 515 1.96E-05 1.79E-05

Table 1: Bi-partitioning results for standard FM and cut hierarchy approach In the following paragraphs we will analyze the disadvantages of this simple hierarchy cutting approach using the example of Figure 3: First, not all hierarchy information is used. Boundaries from elements which do not fit given area/pin constraints Hunfeasible or H nearfeasible are not considered any further. This results in an unfavorable partition where elements which are not directly neighbored (H111, H312) may be put together into one part. Also, almost feasible elements, H nearfeasible , which do not fit because of slightly more gates/pins are decomposed. Second, resolving unfeasible and almost feasible elements (H12, H21, ..) results in a huge amount of free leafcells L free that are difficult and time-consuming to handle. Third, cutting hierarchy discards all information about feasible elements, although in some cases it would be better to split a feasible element in order to save devices.

As shown in Figure 3, the subblocks H41, H42 and H43 of H4 are all feasible elements but need more than half of the resources of a device gc( H4i ) > gclimit 2 and/or pc( H4i ) > pclimit 2 . This results in three devices needed for H4. However, if H42 could be split into two clusters H42a and H42b, H4 would need only two devices by combining H41 with H42a and H43 with H42b respectively.

5 Hierarchical Partitioning Approach There are several ways to overcome the disadvantages of the cut hierarchy approach: merging feasible elements by using a local scope, extending this scope to more than one hierarchy level if design regularity is detected, handling "free" leafcells more efficiently and making almost feasible elements feasible by extracting subelements. A similar approach can be found in [Field95], where logic block placement for FPGAs has been optimized by regrouping the hierarchy of a design.

5.1 Local Merging In order to include information about unfeasible elements we use a local scope during partitioning. For elements of type H HHH , H HHL and H HLL we change the scope to the actual hierarchy level H actual . In this way, the algorithm locally combines feasible hierarchical elements to larger feasible elements. Merging of elements is accomplished in two steps: First, Hi , H j with highest connectivity index are combined by using a connectivity matrix for all H feasable as long as gc( Hi ∪ H j ) ≤ gclimit and pc( Hi ∪ H j ) ≤ pclimit is valid. In a second step, unconnected feasible elements are combined using a vector binpacking approach [Spiek92]. Level 1

Level 4

5.3 Leafcell Migration After identifying feasible elements and performing local and/or extended local merging for H HHL and H HLL elements, a number of leafcells often still exists. We use a standard k-way iterative improvement algorithm (FM) to partition the leafcells of these H HHL and H HLL elements. To avoid many runs of the FM algorithm and reduce complexity, free leafcells are first clustered to already found feasible elements using the same local scope (Figure 5). Size: 2000

Pins: 140

Size: 4600 Size : 20 0 0

P ins: 40

Size: 2000

Pins: 140

Size: 600

Pins: 60

Pins: 160 Grö ße: 2 00 0

P ins: 1 40

Pi ns: 1 40

Level 2

Level 3

Level 4

Sometimes, local merging cannot improve results but one hierarchy level higher an improvement is possible, e.g. in Figure 3 H312 and H322 cannot be merged if the actual scope is set to H31 or H32 respectively. Switching to a higher scope (H3) the combination of H312 and H322 becomes possible. This improvement technique is extremely useful for designs which have a highly regular hierarchical structure. In our approach we look for multiple instances of one cell. If the instance count exceeds a predefined threshold, we set a regularity flag to extend the local scope to more than one hierarchy level (Figure 4). This also improves partitioning results.

Gr öße: 2 00 0

Level 2

Level 3

5.2 Regularity

Level 4

Level 3

Level 4

Level 4

Level 3

Level 4

Level 4

S ize: 2000

P in s: 40

Hierarchical instance

Leafcell

Leafcell migration

Extraction

Level 4

Figure 5: Leafcell migration Level 5

Level 5

Level 5

Level 5

Feasible element

Global merge

Unfeasible element

Local merge

Level 5

Level 5

Extended local merge

Figure 4: Local, extended local and global merging

Level 5

This so called "leafcell migration" step consists of different phases: In a first phase all free leafcells L free migrate into connected elements H feasible if its pinsize is not increased. Usually there are some remaining leafcells after this phase. If the number is lower than a predefined threshold n Lmax , all free remaining leafcells are allowed to migrate into a connected element H feasible even if its

pinsize is increased, as long as H feasible ∪ L free still meets area and pin constraints. Due to this forced leafcell migration for elements of type H HHL additional clusters consisting of only small amounts of leafcells can be avoided in most cases. forall L free forall H feasible , L free is connected to if pc( H feasible ∪ L free ) ≤ pc( H feasible ) & gc( H feasible ∪ L free ) ≤ gclimit & pc( H feasible ∪ L free ) ≤ pclimit then H feasible = H feasible ∪ L free if n L free < n Lmax then forall L free forall H feasible , L free is connected to if gc( H feasible ∪ L free ) ≤ gclimit & pc( H feasible ∪ L free ) ≤ pclimit then H feasible = H feasible ∪ L free else if gc ( U L free ) ≤ gclimit & pc ( U L free ) ≤ pclimit then H new = nU L free s

improvements by using extraction. Runtime is reduced by avoiding resolving almost feasible elements, thereby saving several iterative improvement runs especially for elements of type H HLL . Size: 4000

P ins : 150

Size: 5000 Size: 20 00

Size: 800

Pins: 80

Size: 100

Pins: 160

Pins: 40

Size: 2000

Pins: 4 0

Size: 17 00

Size: 2000

Pins: 4 0

Size: 2000 Size : 200 0

Size: 20 00

Pins: 200 Size: 600

Pin s: 1 60

Size: 200

Pins: 50

Pins: 40 Size: 2 00

Size: 4600

Pins: 40

Pins: 150

Pins: 40

Pins: 9 8

Pins: 40

Hierarchical instance

Leafcell

Figure 7: Element extraction for elements of type H HHH , H HHL and H HLL

else Apply k-way FM algorithm to nU L free s

At the end, a partition consists of feasible hierarchical elements, merged feasible hierarchical elements (with migrated leafcells) and parts with only leafcells.

Figure 6: Pseudo code listing of leafcell migration

P = en H feasible s,.. n H feasible + H feasible +.. + L free s,.. n L free s,..j

If n L free ≥ n Lmax , we first try to put all free instances into one cluster if it meets area/pin constraints. Otherwise we apply a standard k-way iterative improvement algorithm based on [FiMa82] to partition U I free into feasible clusters. Figure 6 shows the described leafcell migration in pseudo code.

This preliminary partition has a poor device utilization for some devices. Therefore, a final merging step combines small parts to larger feasible ones. This step also decreases the number of needed devices. The presented recursive hierarchical partitioning procedure can be described in pseudo code as presented in Figure 8.

5.4 Element Extraction

HierPart( H actual ): // First handle hierarchical elements

As described, we have hierarchically partitioned unfeasible and also almost feasible elements of type H HHH , H HHL and H HLL by descending hierarchy levels using local scopes for leafcell migration and a two-phase two-step merging process. In addition, almost feasible elements can be treated in a different way to improve partitioning results: Almost feasible elements H nearfeasible can be changed into feasible ones H feasible by extracting hierarchical elements Hextract with gc( Hextract ) ≥ ε gc or pc( Hextract ) ≥ ε pc respectively (Figure 7). We choose Hextract as the smallest H in H nearfeasible which lets H nearfeasible become feasible. After extraction of Hextract , H nearfeasible has been split into one or two feasible elements in case that Hextract itself is a feasible element. The column labeled "extract" in Table 2 shows drastical

forall hierarchical elements Hi in H actual do if Hi is feasible then Create new cluster with Hi else if Hi contains hierarchical elements then // Hi is of type H HHH , H HHL or H HLL if Hi is feasible after extracting Hextract then Create new cluster with Hi if Hextract is feasible then Create new cluster with Hextract else // -> Go down one hierarchy level HierPart( Hextract ) else

// Don't know how to partition Hi // -> Go down one hierarchy level HierPart( Hi ) else // Hi is of type H LLL Apply k-way partitioning algorithm to Hi // Now consider free leafcells if exist free leafcells ( n L free ≥ 0 ) then // Hi is of type H HHL or H HLL Let all cluster grow without increasing pincount if still free leafcells exist ( n L free ≥ 0 ) then if number of left free instances n L free ≤ n Lmax then // Element is of type H HHL Grow cluster even if pincount increases else // Element is of type H HLL Apply k-way partitioning to leafcells // Now all hierarchical elements and free // instances are clustered. Merge already created clusters using connectivity matrix and vector binpacking return Figure 8: Pseudo code for the presented Hierarchical Partitioning approach (HIERPART)

partitioned. Due to its high complexity and very long runtime it could not be processed with the flat or cut hierarchy approach. With an average utilization of 2213 GE/device HierPart outperforms many partitioning approaches built into commercial logic emulators as well as the approach presented in [ChLi95], which achieved an utilization of up to 1100 GE/device, although it uses a much higher pin limit of 184 pins per FPGA. As shown in Table 2, local optimization always outperforms the FM-based and cut hierarchy approach by up to 50% (ind1, ind2). In addition, extraction decreases the number of devices by up to 25% (ind2). In the column labeled "global opt" we use a different merging scheme: All clusters found by hierarchical partitioning are preserved without any local or extended local merging. Only at the end all clusters found have been merged using our two-phase two-step merging process. This seems to produce better results than local optimization but needs in most cases more time to process (Table3: ind2, ind4). Design ind1 ind2 ind3 ind4

ch+FM 100% 100% 100% 100%

Runtime local opt. global opt. 58% 51% 19% 27% 1% 5% 41% 35%

extract 10% 2% 5% 41%

6 Results

Table 3: Runtime improvement using different options for hierarchical partitioning

To test our approach, we partitioned five industrial designs ranging from 1387 to 140k gate equivalences (ge). Two small designs, ind1 and ind2, show high primary IO-count and high net/instance ratios. Table 2 shows results using a standard k-way iterative improve approach on the flat netlist (flat + FM), cut hierarchy approach (ch + FM) and our new hierarchical partitioning method with different parameter settings.

Table 3 shows runtime improvements compared to the cut hierarchy approach. Obviously, extraction gives the best speedup for hierarchical partitioning, but even local or global optimization results in a considerable improvement.

Size Design (ge) ind1 ind2 ind3 ind4 ind5

1.4 11k 14k 24k 143k

I/O flat+ FM 341 12 285 38 222 9 119 37 106 -

Number of devices ch+ local global extract FM opt. opt. 11 6 5 4 31 19 17 12 6 31 7 5 28 27 26 24 65

Table 2: Number of devices using different options for hierarchical partitioning In the column labeled "local opt" local merging and leafcell migration is enabled. In column "extract" the above described element extraction is enabled, too. A 140k industrial design (ind5) has been successfully

5000 4500 4000 3500 3000 GE 2500 2000 1500 1000 500 0 1 2 3

4 5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 PartNo 21 22 23 24

ind4 ind3 ind2 ind1

Figure 9: Distribution of part sizes and pins for ind1ind4, gc=5000, pc=144 (Xilinx 3090) Figure 9 shows the size distribution for the four industrial designs. It shows good average part utilization.

For ind4 numerous small parts exist due to several runs of the k-way iterative improvement algorithm, caused by many free leafcells.

7 Conclusion and Prospects As shown by the presented results considering hierarchy information and using a local scope for optimization improves partitioning results as well as performance. Using higher level logical and structural clusters in a design does not only decrease cutsize but also offers the opportunity to include other objectives like timing optimization and avoidance of feedback loops over device boundaries. The presented algorithm offers several options to enable different merging schemes and thresholds for device parameters. At this time we investigate the effect of different parameter settings subject to different hierarchy structures in order to implement an adaptive hierarchical partitioning scheme. Additional speedup can be achieved by using regularity in the form of temporary saved partitioning results to avoid partitioning of multiple instances of the same cell. By descending the hierarchy and using a local scope, complexity has been extremely reduced so that full search approaches become feasible promising near optimum partitioning results. The extraction process can be improved by considering more than one element during extraction. Decomposition of feasible elements can also improve the results, but is not yet implemented. In our opinion bottom up approaches like the presented HierPart are favorite candidates to handle upcoming design complexities. However, the partitioning results presented show that considering domain specific information for partitioning promises more improvement than optimizing any theoretical graph based approach.

8 References [AlKa94] Alpert, C.J.; Kahng, A.B.; "A General Framework for Vertex Orderings With Applications to Netlist Clustering", International Conference on Computer Aided Design, pp. 63-67, 1994 [AwPe90] Awerbuch, B.; Peleg, D.; "Sparse Partitions", IEEE Annual Symposium on Foundations of Computer Science, pp. 503-513, 1990 [BrSa94] Brasen D.; Saucier G.; "FPGA Partitioning for Critical Paths", Proceedings of the European Design Automation Conference, pp. 99-103, 1992

[BuHe89] Bui T.; Heigham C.; Jones C.; Leighton T.; "Improving the Performance of the Kernighan-Lin and Simulated Annealing Graph Bisection Algorithms", Proceedings of the Design Automation Conference, pp. 775-778, 1989 [ChLi95] Chou, N.-C.; Liu, L.-T.; Cheng, C.-K.; Dai, W.-J.; Lindelof, R.; "Local Ratio Cut and Set Covering Partitioning for Huge Logic Emulation Systems", IEEE Transactions on Computer Aided Design, vol 14, no. 9, pp. 1085-1092, 1995 [CoLi94] Cong J.; Li Z.; Bagrodia R.; "Acyclic Multi-Way Partitioning of Boolean Networks", Proceedings of the Design Automation Conference, pp. 670-675, 1994 [CoSm93] Cong J.; Smith M.; "A Parallel Bottom-Up Clustering Algorithm with Applications to Circuit Partitioning in VLSI Design", Proceedings of the Design Automation Conference, pp. 755-760, 1993 [DiHo93] Ding C.; Ho C.; "A New Optimization Driven Clustering Algorithm for Large Circuits", Proceedings of the European Design Automation Conference, pp. 28-32, 1993 [FiMa82] Fiduccia C.; Mattheyses R.; "A Linear-Time Heuristic for Improving Network Partitions", Proceedings of the Design Automation Conference, pp. 175-181, 1982 [Field95] Fields, C. A.; "Creating Hierarchy in HDL-Based High Density FPGA Design", Proceedings of the European Design Automation Conference, pp. 594599, 1995 [FoFu62] Ford, J. R. and Fulkerson, D. R.; "Flows in Networks", Princeton University Press, 1962 [GaPr90] Garbers, J.; Prvmel, H.J.; Steger, A.; "Finding Clusters in VLSI Circuits", International Conference on Computer Aided Design, pp. 520523, 1990 [HaKa91] Hagen, L.; Kahng, A; "Fast Spectral Methods for Ratio Cut Partitioning and Clustering", International Conference on Computer Aided Design, pp. 10-13, 1991 [HaKa92] Hagen L.; Kahng, A.B.; "New Spectral Methods for Ratio Cut Partitioning and Clustering", IEEE Transactions on Computer Aided Design, vol. 11, no. 9, pp. 1074-1085, 1992 [HwGa95] Hwang J.; Gamal, A; "Min-Cut Replication in Partitioned Networks", IEEE Transactions on Computer Aided Design, vol. 14, no. 1, pp. 96-106, 1995 [KeLi69] Kernighan B.; Lin S.; "An Efficient Heuristic Procedure for Partitioning Graphs", The Bell System Technical Journal, pp. 291-307, 1969

[KhMa95] Khan, S. A.; Madisetti, V. K.; "System Partitioning of MCMs for Low Power", IEEE Design & Test of computers, Spring 1995, pp. 41-52, 1995 [KrNe91] Kring C.; Newton A.; "A Cell-Replicating Approach to Mincut-Based Circuit Partitioning", International Conference on Computer Aided Design, pp. 2-5, 1991 [KuBr94] Kuznar, R.; Brglez, F.; Zjac, B.; "Multi-way Netlist Partitioning into Heterogeneous FPGAs and Minimization of Total Device Cost and Interconnect", Design Automation Conference, pp. 238-243, 1994 [LiSh93] Liu, L.-T.; Shih, M.; Chou, N.-C.; Cheng, C.-K.; Ku, W.; "Performance-Driven Partitioning Using Retiming and Replication", International Conference on Computer Aided Design, pp. 296299, 1993 [MuBr91] Murgai R.; Brayton R.; Sangiovanni-Vincentelli A.; "On Clustering for Minimum Delay/Area", Proceedings of the Design Automation Conference, pp. 6-8, 1991 [ObGl95] Ober, U; Glesner, M.; "Multiway Netlist Partition onto FPGA-based Board Architectures", Proceedings of the European Design Automation Conference, pp. 150-155, 1995 [Pla90] Plaisted, D. A.; "A Heuristic Algorithm for Small Separators in Arbitrary Graphs", SIAM Journal on Computing, vol. 19, no. 2, pp. 267-280, 1990 [RiDo94] Riess B.M.; Doll K.; Johannes F.M.; "Partitioning Very Large Circuits Using Analytical Placement Techniques", Proceedings of the Design Automation Conference, pp. 646-651, 1994 [RiSc95] Riess, B. M.; Schoene, A.; "Architecture Driven KWay Partitioning for Multichip Modules", Proceedings of the European Design Automation Conference, pp. 71-76, 1995 [Rubi87] Rubin, S.M.; "Computer Aids for VLSI Design", Addison-Wesley, pp. 15 ff, 1987 [Saab93] Saab Y.; "Post-Analysis Based Clustering Dramatically Improves The Fiduccia-Mattheyses Algorithm", pp. 22-27, 1993 [San89] Sanchis, L. A.; "Multiple-way network partitioning", IEEE Transactions on Computer, vol. 38, no. 1, pp. 62-81, 1989 [ScUl72] Schuler D.; Ulrich E.; "Clustering and Linear Placement", Design Automation Workshop, pp. 5056, 1972 [ShKu93] Shih, M.; Kuh, E.S.; "Quadratic Boolean Programming for Performance-Driven System Partitioning", Proceedings of the Design Automation Conference, pp. 761-765, 1993

[Spiek92] Spieksma, F. C. R.; "A Branch-And-Bound Algorithm for the Two-Dimensional Vector Binpacking Problem", Computer And Operations Research, Vol 21, pp. 19-25, 1994 [WeCh90] Wei Y.; Cheng C.; "A Two-Level Two-Way Partitioning Algorithm", International Conference on Computer Aided Design, pp. 516-519, 1990 [YaWo94] Yang, H.; Wong, D.F.; "Efficient Network Flow Based Min-Cut Balanced Partitioning", International Conference on Computer Aided Design, pp. 50-55, 1994