The PARTY Partitioning - Library User Guide { Version 1.1 - CiteSeerX

0 downloads 0 Views 395KB Size Report
Sep 16, 1996 - data-structures used as interface to PARTY are simple and easy to generate. .... of vertices and a common (minimizing) cost function is the number of crossing ..... it makes sense to perform several runs of the same partitioning method. .... spite of the random global method, all partitioning algorithms are ...
The PARTY Partitioning - Library User Guide { Version 1.1 Robert Preis  HEINZ NIXDORF INSTITUT Universitat Paderborn, Germany [email protected]

and Ralf Diekmann y Department of Mathematics and Computer Science Universitat Paderborn, Germany [email protected]

September 16, 1996

Abstract

The problem of partitioning a graph into a number of pieces is one of the fundamental tasks in computer science and has a number of applications e.g. in parallel programming or VLSI design. Finding optimal partitions according to di erent measures is in most cases NPcomplete. Nevertheless, a large number of ecient partitioning heuristics have been developed during recent years. The performance of these methods in terms of computation time as well as quality of approximation is heavily in uenced by choices of parameters and certain implementation details. Fortunately, the partitioning problem itself is clearly de ned and its description leads to a small interface. Thus, ecient implementations of approximation heuristics can be re-used for di erent applications. The PARTY partitioning library serves a variety of di erent partitioning methods in a very simple and easy way. Instead of implementing the methods directly, the user may take advantage of the ready implemented methods of the library. All implementations include latest developments to increase the performance of the partitioning heuristics. Two kinds of interfaces allow the use as stand-alone tool as well as the inclusion into application codes. The data-structures used as interface to PARTY are simple and easy to generate. Several research projects currently use the PARTY partitioning library to solve the partitioning problem.

 This work is supported by the DFG Graduiertenkolleg "Parallele Rechnernetze in der Produktionstechnik",

GRK 124/2-96 and the EC HC&M Project MAP. y This work is supported by the DFG Sonderforschungsbereich 376: "Massive Parallelitat: Algorithmen, Entwurfsmethoden, Anwendungen." and EU ESPRIT Long Term Research Project 20244 (ALCOM-IT).

2

CONTENTS

3

Contents

1 Introduction 2 Installing the PARTY Partitioning Library 3 Executable Code: party

3.1 Example with Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Graph Input File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Interface Code: party lib.h and libparty.a

4.1 General Partitioning Procedure party lib . 4.2 Parameters in the Interface . . . . . . . . 4.2.1 Method { Parameters . . . . . . . 4.2.2 Graph { Parameters . . . . . . . . 4.2.3 Partitioning { Parameters . . . . . 4.2.4 Result { Parameters . . . . . . . . 4.2.5 Information { Parameters . . . . . 4.3 Single Partitioning Procedures . . . . . . 4.3.1 Global Partitioning Procedures . . 4.3.2 Local Partitioning Procedures . . . 4.4 Utility Procedures . . . . . . . . . . . . . 4.4.1 I/O Procedures . . . . . . . . . . . 4.4.2 Check and Information Procedures 4.4.3 Memory and Time Procedures . .

5 The Partitioning Problem 6 Partitioning Methods

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

6.1 Classi cation: Global and Local Methods . . . . . . . 6.2 Global Methods . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Optimal . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Linear . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Scattered . . . . . . . . . . . . . . . . . . . . . 6.2.4 Random . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Gain . . . . . . . . . . . . . . . . . . . . . . . . 6.2.6 Farhat . . . . . . . . . . . . . . . . . . . . . . . 6.2.7 Coordinate Sorting . . . . . . . . . . . . . . . . 6.2.8 Multilevel . . . . . . . . . . . . . . . . . . . . . 6.2.9 Spectral . . . . . . . . . . . . . . . . . . . . . . 6.2.10 Inertial . . . . . . . . . . . . . . . . . . . . . . 6.3 Local Methods . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Kernighan-Lin . . . . . . . . . . . . . . . . . . 6.3.2 Helpful-Set . . . . . . . . . . . . . . . . . . . . 6.3.3 Local Partitioning by Multiple Local Bisection

7 Planned Future Extensions

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

4 4 6 6 7 9

10

11 11 11 12 13 14 14 14 15 16 16 16 17 18

18 20

20 20 21 21 21 21 21 22 22 22 22 23 23 23 24 27

28

4

2 INSTALLING THE PARTY PARTITIONING LIBRARY

1 Introduction Graph-Partitioning problems occur in a wide range of applications. The task is to divide the set of vertices of a graph into a given number of parts, while restrictions and cost functions have to be considered. A common restriction is that parts have to have a balanced number of vertices and a common (minimizing) cost function is the number of crossing edges between vertices belonging to di erent parts. The calculation of an optimal solution is NP-complete and ecient heuristic methods have to be used to calculate sucient solutions in an appropriate time. The number of existing partitioning tools has markedly increased in recent years. Each application which needs to solve a partitioning problem generally uses its own exclusive partitioning tool. Transferring the code of that speci c partitioning tool to a di erent application usually causes high e ort in re-implementing the code. An additional problem occurs with new partitioning methods which are very complex. Even originally simple methods may become very complex if the partitioning problem has some generalizations such as di erent weights for vertices or edges, arbitrary numbers of nal partitions, speci c load balance criteria or focus on even more complex cost functions. The performance of partitioning methods depends highly on the implementation and it might be very time consuming for the user to optimize the partitioning code. The PARTY partitioning library tries to solve all these problems and its goal is to provide the user with  several partitioning methods of di erent character  ecient implementations to guarantee high performance  a variety of generalization of the methods to re ect the speci c constraints of the application more precisely  a simple interface which guarantees an easy use as well as a strong control of the methods Several other libraries like Chaco ([HL94]) by Hendrickson and Leland, Jostle ([WCE95]) by Walshaw, Metis ([KK95]) by Karypis and Kumar, Scotch ([PR96]) by Pellegrini or TOP/DOMDEC by Farhat and Simon ([FS93]) exist to solve graph partitioning or similar problems. Interfaces between those di erent libraries allow to use the methods of several libraries within the environment of one single library. Therefore, the PARTY partitioning library provides interfaces to the Chaco library and some methods therein can be invoked from the PARTY environment. For immediate use of PARTY, Section 2 shows how to install it and Sections 3 and 4 describe the two di erent accesses to the library code. Sections 5 and 6 give a better background of the partitioning problem as viewed in this library and a short description of the implemented methods.

2 Installing the PARTY Partitioning Library To obtain the code of the PARTY Partitioning Library, please contact the authors or take a look at the WWW page http://wwwhni.uni-paderborn.de/graduierte/preis/party.html. The whole package of the PARTY Partitioning Library is stored in the shipped le PARTY 1.1.tar.gz. Please use the following instructions to fold up the package: 1. gunzip PARTY 1.1.tar.gz 2. tar -xvf PARTY 1.1.tar

5 This will create the directory PARTY 1.1 and ve subdirectories doc, src, include, bin and graphs with further subdirectories and les.  doc contains this User Guide and the License Agreement in Postscript format.  src contains the source les of the package. The code is written in C and consists of approx. 3.000 lines altogether.  include contains only one le party lib.h which declares all available procedures of the library. This header le is used in the interface modus (Section 4).  bin contains several subdirectories, each of them for a speci c type of system. The names of the systems are according to the widely used environment variable PVM ARCH and the following systems are included: SUN4, SUN4SOL2, SUNMP, SGI5, SGIMP or PowerPVM. If you are using a di erent system, you have to compile the code for your system. In this case, you have to change to the src directory, set the variable SYS in the Make le to your system and execute make. Each subdirectory of bin contains an executable le party (used in the executable modus, Section 3) and a binary link-library le libparty.a (used in the interface modus, Section 4). It is advisable to include the according executable subdirectory in the environment variable of your executable path.  graphs contains the les Grid32x32 and Grid32x32.xyz which is a graph of a 32x32 square grid stored in a speci c graph input le format (Section 3.3). The le Grid32x32 contains the adjacency information and the le Grid32x32.xyz the coordinate information of the graph. As an example, execute PARTY 1.1/bin/fSUN4,SUN4SOL2,...g/party (the executable le of the subdirectory depending on the system you are using), or just party if the directory is in your execution path, and the partitioning library will be invoked with default values for all options. Please refer to the following section concerning the default and alternative values. The disadvantage of this executable modus is that the communication between the user's application and the partitioning code is done by passing the graph information and the partitioning results via les, which is very time consuming. Therefore, the second modus is to use an interface which directly connects the procedures of the library to the user's application. PARTY provides the header le party lib.h and the object le libparty.a for this purpose. To use the implementations of the partitioning methods, the user has to write a short interface code of usually about 50 lines. This is necessary because each application uses its own representation of information about the graph and the partitioning problem. Although the same is true for the partitioning heuristics, the methods in PARTY use a very simple representation as interface which will be described in Section 4. If the Chaco library is available, it may be linked to the PARTY library and some distinct methods from it can then be used to enlarge the number of available methods in PARTY. The user has rst to create a chaco.h header le with the declaration of the Chaco interface procedure and a libchaco.a library le from the Chaco source code. Then, the CHACO * variables in src/Make le have to be used and adjusted to those Chaco les. After a new compilation (type make in the src directory) the new PARTY code will have integrated some methods from the Chaco library.

6

3 EXECUTABLE CODE: PARTY

3 Executable Code:

party

The executable modus of PARTY allows to use the partitioning methods without being connected to a speci c application. It works 'stand alone'.

3.1 Example with Default Values The command party at the prompt performs a partitioning example with default values as shown in Figure 1. [PARTY_1.1]->party party [-f graphfile] [-xyz coorfile] [-g global] [-l local] [-# runs] [-p parts] [-b balance] [-r recursive] [-s partfile] [-o output] [-t times] -f graphfile (Grid32x32) -s save partition in partfile () -xyz coordinate file () -o output: 0..4 (1) -g global method:{opt,lin,sca,ran,gai,far, -t times output: 0..4 (2) coo,mul,spm,spl,ine,all,} (all) -l local method: {hs,kl,ckl,no} (hs) -# number of runs (1) -p number of parts (2) -b additional imbalance (0.0) < Default values are in -r recursive?: 0/1 (1) brackets (). > Example with default values: ========== PARTY Version 1.1, 16 September 1996 ========================== Graphfile : Grid32x32 Global / Local : all / hs Parts / add. Bal. : 2 / 0.00 ---------- Graph Information -------------------------------------# Vertices/Edges : 1024 / 1984 Degree (min/ave/max/tot) 2 3.88 4 3968 Interval of X coordinates: [0.000000,31.000000] Interval of Y coordinates: [0.000000,31.000000] Components : 1 ---------------------------------------------------------------------------- Partition Information ---------------------------------VERTEX-based: Size min/ave/max/tot: 512 512.00 512 1024 Internal min/ave/max/tot: 480 480.00 480 960 EDGE-based: Part Deg. min/ave/max/tot: 1 1.00 1 2 External min/ave/max/tot: 32 32.00 32 64 Total cut size is 32 ------------------------------------------------------------------MEMORY (current/max): 0 / 109780 TIME : 0.36 sec Pre : 0.00 Part : 0.35 Rest : 0.00 ========== PARTY End ==============================================

Figure 1: The executable party with default values All output is directed to stdout and a list of possible options that can be changed in future executions of party will be shown at the beginning. As an example, party will be executed with default values to give a feeling for the work with the Partitioning-Library. In the pre-processing part of the code, information about the input data is shown. First is the Version number of the code, followed by the values of the major options: a 32 by 32 square grid

3.2 Options

7

is to be partitioned with all global methods (all) combined with the local Helpful-Set method (hs), the resulting number of parts is 2 and no additional imbalance (0.0) is allowed. Please refer to the next section for detailed description of the options. The following block contains general information about the graph like number of vertices and edges, analysis of vertex degrees, intervals of the provided vertex coordinates and the number of connected components in the graph. The calculation of a partition - the main step of the code - may take some time, depending on the given problem and on the computational power of the machine, but does not produce any output with the settings described so far. After completion of the partition calculation, the post-processing step gives an analysis of several interesting values. It starts with a block of information about the partition quality, showing an analysis of several vertex- and edge-based values. The most important value is the 'Total cut size' at the end of the block. The output nishes with some information about the memory usage (current and maximum dynamically allocated memory) and the time performance (total time and split time for the di erent steps).

3.2 Options

The executable party has several options: party [-f graph le] [-xyz coor le] [-g global] [-l local] [-# runs] [-p parts] [-b balance] [-r recursive] [-s part le] [-o output] [-t times] You can change the default partitioning example to the desired one by changing one or more of the options which are listed with their default values in brackets:

-f graph le (Grid32x32)

To perform the partitioning on a di erent graph other than the default graph of a 32 by 32 square grid, the graph has to be described in a le according to the graph input le format of Section 3.3.

-xyz coordinate le (Grid32x32.xyz)

Some partitioning methods are based on geometric information about the graph (i.e. vertex coordinates) and can only be used if they are provided. The coordinates of the vertices have to be stored in an extra coordinate le. The format is described in Section 3.3.

-g global method: flin,sca,ran,gai,far,mul,spm,spl,ine,coo,opt,all, < le>g (all)

As will be described in Section 5, each partitioning process consists of at least one global partitioning method. The partitioning methods Linear (lin), Scattered (sca), Random (ran), Gain (gai), Farhat (far), Coordinate (coo) and Optimal (opt) are implemented as described in Section 6.2. Additionally, if the Chaco library is also available, the methods Multilevel (ml), Spectral/Multilevel (spm), Spectral/Lanczos (spl) and Inertial (ine) may be used. A special case is the choice `all': all available global methods (except opt) are performed and the best of these results is taken as nal. The methods coo and ine are based on geometric information of the graph and may only be used if a coordinate le is provided. In addition, a previous calculated global partition may be read from a le. In this case, the name of the le in which the partition is saved has to be passed with the -g parameter. The partition has to be stored according to the partition format described with the option -s below.

-l local method: fhs,kl,ckl,nog (hs)

The partitioning results of the global methods may be improved by using a local partitioning heuristics. PARTY includes implementations of the Helpful-Set (hs) and Kernighan-Lin

8

3 EXECUTABLE CODE: PARTY

(kl) heuristic which are described in Section 6.3. If the Chaco library is also available, (ckl) refers to the Kernighan-Lin implementation of Chaco. -# number of runs (1) When using nondeterministic methods (currently only the global partitioning method ran), it makes sense to perform several runs of the same partitioning method. This parameter allows to perform more than one run of the same method and returns the best of all results. -p number of parts (2) The number of nal parts may be an arbitrary positive number. If the recursive partitioning option is chosen (default, see option -r below), the number of partitions is reduced to the next lower power of two, e.g. a value of 21 would be reduced to 16. -b additional imbalance (0.0) Load balance is handled very strictly in the graph partitioning problem, i.e. no part should have a size of more than d ##ofofvertices parts e. Some applications allow a larger imbalance in favor of a lower cut size. If a value of x is provided with this option, partitions with a size of up to d ##ofofvertices parts + xe are considered. This increases the number of considered partitions and might result in one with a lower cut size than possible with strictly balanced partitions. A more detailed discussion is given in Section 5. -r recursive?: 0/1 (1) The partitioning problem becomes much simpler if the graph is only split into 2 parts (bisection problem). A value of 1 refers to recursive bisection, i.e. the partitioning problem is solved by recursively applying the bisection heuristic on the remaining parts (in this case the number of parts given by the option `-p' above is reduced to the next lower power of 2). A value of 0 refers to direct partitioning in the speci ed number of parts. Most partitioning methods in this library are originally developed for the bisection problem and the recursive modus is recommended. -s save partition in part le () This option saves the partition in a le with the given lename. The le has as many lines as there are vertices in the graph and each line consists of one number re ecting the partition number (range [0, p-1]) of the according vertex. Note that the vertices are in the same order as in the graph description of the graph input le. -o output: 0..4 (1) To show the performance of the partitioning process, a certain amount of information may be printed on the screen. The amount of output can be controlled by using values between 0 (no output) and 4 (maximum output). The default value of 1 shows the output as in Figure 1. -t times output: 0..4 (2) This option produces a certain amount of times information on the screen to analyze the time requirements for the individual steps of the code. A value of 0 produces no output, whereas a value of 4 produces the most detailed output. The default value of 2 shows the output as in Figure 1.

3.3 Graph Input File Format

9

3.3 Graph Input File Format

The executable code party needs an input le representing the graph. As default, the le Grid32x32 containing the adjacency information of a 32x32 square grid is provided. The basic structure of the graph input le format is a subset from the format used in the Chaco library [HL94]. In fact, if no weights or identity information is provided, both formats are identical. The format of the adjacency le is shown in Figure 2 % The format of the adjacency file % Possible comment lines at the top start with % or #... % |V|+1 non comment lines follow % |V| |E| [1](1) [Vertex-weight] Neighbor_1 (Edge-weight_1) Neighbor_2 (Edge-weight_2)... . . .

Figure 2: The format of the adjacency le Comment lines on the top of the le starting with `%' or `#' are not considered. The rst noncomment line is the head line showing the number of vertices (jV j), the number of edges (jE j) and may have a third code number consisting of 2 digits. This code number has to be provided if weights for vertices or weights for edges are included in the graph description. To be more precise, the code number consist of 2 digits. If the 10`s digit is nonzero, vertex weights are provided and if the 1`s digit is nonzero, edge weights are provided. Then jV j lines follow, each of them describing the graph information of a single vertex. The vertex description starts with the weight of the vertex (type oat) if the according digit of the code is set. The description continues with a list of all neighbors of the particular vertex. A neighbor is speci ed by its position in this graph description, starting with 1 for the rst vertex and jV j for the last one. This is di erent to the internal numbering of PARTY and the description in Section 4.2.2 where the vertices are numbered from 0 to jV j ? 1. If edge weights are provided and the according digit in the code number is set, each neighbor is immediately followed by the weight of the edge. Note, that only integer values are allowed for edge weights. The format will be illustrated on the graph of Figure 3. It has 5 vertices (marked from 1 to 5) and 8 edges. Both, vertices and edges, are weighted. 1 (4.5)

2 (1.4)

111 000

7

10

0011 5 (3.0)

16

3

4 (0.7)

3 8

5

111 000

0011

100

111 000

3 (5.0)

Figure 3: Example graph Figure 4 shows the adjacency le of the example graph. Note that each single edge is speci ed twice in the description, i.e. in the neighbor list of each of the incident vertices. Also, if a digit

10

4 INTERFACE CODE: PARTY LIB.H AND LIBPARTY.A % The adjacency file of the example graph % 5 8 11 4.5 2 100 5 10 4 16 1.4 1 100 5 7 3 3 5.0 2 3 5 8 4 3 0.7 3 3 5 5 1 16 3.0 1 10 2 7 3 8 4 5

Figure 4: The adjacency le of the example graph of the code number is provided, the according information has to be provided for all vertices or edges. If geometric information are to be considered in addition to the adjacency information, they have to be provided in a second coordinate le listing the coordinates of the vertices (like the le Grid32x32.xyz). 0.0 1.0 1.0 0.0 0.5

1.0 1.0 0.0 0.0 0.5

Figure 5: The coordinate le of the example graph Figure 5 shows the coordinate le of the example graph. The number of lines is equal to the number of vertices and each line contains the x- and y- coordinates of the according vertex of the example. 3-dimensional examples with z-coordinates would have three values in each line. Please note that PARTY reads the rst line to determine the number of coordinates. All vertices have to have the same number of coordinates, i.e. each line of the coordinate le has to have the same number of values.

4 Interface Code:

party lib.h

and

libparty.a

The second access modus to the library is via the interface code. Each partitioning method is implemented in its own single procedure (see Section 4.3). A speci c procedure party lib is provided to combine the di erent methods in an easier way. As described in Section 2, the user may access the procedures by including the header le party lib.h in his code and linking the library libparty.a to his executable code. The following steps have to be performed by the user in his code in order to use the PARTY interface: 1. transform the graph from the representation in the application into the simple interface representation described by the parameters of the procedures in party lib.h 2. invoke one or more of the procedures of party lib.h 3. re ect the result of the procedures back to the own application This section will rst describe the general partitioning procedure party lib which controls the use of all implemented partitioning methods. It then takes a closer look at the parameters needed to describe the interface and follows with a description of the single partitioning procedures. It nishes with a description of several utility procedures.

4.1 General Partitioning Procedure party lib

11

4.1 General Partitioning Procedure party lib Although there are many methods implemented in this library, there is one central procedure party lib which performs the partitioning by combining several single methods. int party_lib ( char *Global, char *Local, int runs, int n, float *vertex_w, float *x, float *y, float *z, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int recursive, int *part, int *cutsize, int Output);

The party lib procedure needs several parameters to specify the partitioning methods (1st row of parameters), the graph information (2nd and 3rd row), the partitioning problem (4th row), the result of the calculated partition (5th row) and the amount of information output (6th row). The parameters will be described in the following section in more detail. As result, party lib returns a value of 0 on success and a value of 1 on failure.

4.2 Parameters in the Interface Parameters play an important role in the interface between applications and the library. The goal is to keep them very simple for an easy understanding, but complex enough to pass all required information and to have an adequate control of the implemented methods. In general, the parameters of party lib can be divided into ve categories. First, there are parameters to specify the combination of global and local partitioning methods and the number of runs. Second, parameters which provide information of the graph. Third, parameters to specify the partitioning problem and, fourth, parameters which will be used to pass the results of the calculated partition. The amount of information output will be controlled by the fth set of parameters. Depending on the complexity of the problem, usually only a subset of all parameters will be relevant for a speci c application. The structure of the parameters in this library are inspired by the interface structure in [HL94].

4.2.1 Method { Parameters As will be discussed in Section 6.1, each partitioning calculation consists of a global step and of possible further local steps. Therefore, global and local methods have to be speci ed. Despite of the random global method, all partitioning algorithms are deterministic and calculate reproducible results.

Global: pointer to a string of characters, global partitioning methods

This parameter speci es the global method or methods that shall be applied. There is a choice between opt (Optimal), lin (Linear), sca (Scattered), ran (Random), gai (Gain), far (Farhat), coo (Coordinate), mul (Multilevel), spm (Spectral/Multilevel), spl (Spectral/Lanczos) and ine (Inertial). The methods mul, spm, spl and ine may only be used if the Chaco library is available and the methods ine and coo may only be used when geometric information is passed with the parameters x, y and z. Section 6.2 explains the methods in more detail. To take use of all methods at once, the choice of all as global method performs all available global methods (except opt) and takes the best as nal result.

12

4 INTERFACE CODE: PARTY LIB.H AND LIBPARTY.A It is also possible to use a previously calculated partition. The choice of part assumes that the previous calculated partition is saved in the array part. Any other string is assumed to be a lename of a le storing a partition according to the format of the -s option of Section 3.2.

Local: pointer to a string of characters, local partitioning method

This parameter speci es the local partitioning method. Valid choices are the KernighanLin (`kl') and Helpful-Set(`hs') heuristics which are described in section 6.3. In addition, if the Chaco library is available, the choice of (`ckl') invokes the Chaco implementation of the Kernighan-Lin method. Any other string is ignored.

runs: integer, number of runs

The random partitioning method is the only nondeterministic algorithm implemented in PARTY. Di erent runs of this method usually lead to di erent results. In this case, the runs parameter allows to specify the number of independent runs that are performed. The best of all runs will be taken as nal result.

4.2.2 Graph { Parameters The central object in graph partitioning is the graph itself. Its representation in an adjacency list data structure as described below guarantees a high performance of the algorithms that are used. Depending on the application, a graph may consist of only vertices and edges, but may also be provided with di erent weights for vertices and edges. Although an application might have a lot of graph information to o er, each of the partitioning procedures in the library uses only some of the information, i.e. only a few parameters with relevant information have to be passed. The following parameters are used to pass the graph information (an example is shown in Figure 3 and Table 1):

n: integer, number jV j of vertices in the graph

The value should be non-negative. Graphs with more than 106 vertices have been tested with the procedures in this library.

vertex w: pointer to an array of n oats, weights of vertices

Many applications construct graphs with vertices which have di erent weights attached to them. The aim is to calculate partitions which are not balanced with respect to the number of vertices, but with respect to the total weight of vertices in the parts. Each vertex may have a nonnegative weight. If no vertex weights are considered, vertex w has to be set to NULL and the load balance is performed with respect to the number of vertices.

x: pointer to an array of n oats, x-coordinates

Some partitioning methods are based on geometric information of the graph. Therefore, coordinates of the vertices have to be provided with the parameter x, y and z. The NULL pointer should be passed if no geometric information is available.

y: pointer to an array of n oats, y-coordinates

This array contains the y-coordinates if 2- or 3-dimensional coordinates are provided. Otherwise, the NULL pointer should be passed.

z: pointer to an array of n oats, z-coordinates

This array contains the z-coordinates if 3-dimensional coordinates are provided. Otherwise, the NULL pointer should be passed.

4.2 Parameters in the Interface

13

edge p: pointer to an array of n + 1 integers, neighbor pointer for each vertex

This array contains an index to the edge array for each vertex. Each vertex i has edge p[i+1] - edge p[i] neighbors which will further be described in the edge array below. Thereby, edge p[0] is 0 and edge p[n] is twice the total number of edges in the graph. edge: pointer to an array of edge p[n] integers, neighbor for each edge This array contains the neighbor lists of all vertices. The neighbors of vertex i are listed from edge[edge p[i]] to edge[edge p[i+1]-1]. edge w: pointer to an array of edge p[n] integers, weights of edges Like in the case of vertices, edges, too, may have di erent weights. The calculation of the cut size in the partitioning methods will in this case base on the total weight of all cut edges instead on the number. Edge weights may also have negative values. Please note that in the current version of the library, edge weights have to be integer values. If no edge weights are given, edge w has to be set to NULL and the cut size will be based on the number of cut edges. To illustrate the graph parameters, take another look at the graph of Figure 3. The graph parameters of this example graph are shown in Table 1. parameternindex [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] n 5 vertex w 4.5 1.4 5.0 0.7 3.0 0.0 1.0 1.0 0.0 0.5 x 1.0 1.0 0.0 0.0 0.5 y z NULL 0 3 6 9 12 16 edge p edge 1 4 3 0 4 2 1 4 3 0 2 4 0 1 2 3 100 10 16 100 7 3 3 8 3 3 5 16 10 7 8 5 edge w Table 1: Example graph parameters Note that some methods require only a few graph parameters and if no weights shall be considered, the according parameters vertex w and edge w have to be set to NULL.

4.2.3 Partitioning { Parameters The third important set of parameters deals with the partitioning problem that has to be solved. Besides the number of nal parts, there are some further parameters: p: integer, number of parts The nal number of parts should be in the range from 2 to n. If it is not a power of 2 in the (default) recursive mode, it will be reduced to the next lower power of 2. add bal: oat, allowed imbalance Load balancing is a widely discussed topic in graph partitioning. Generally, the total weight of all vertices should be distributed among the parts approximately evenly and only balanced partitions are considered as described in Section 5. In some applications, slightly imbalanced partitions might be favorable because of other advantages. In this case the procedures may accept non-balanced partitions where the amount of imbalance can be controlled by the add bal parameter. To be more precise, all partitions  with a balance bal() < max vertex weight + add bal are considered (cf. Def. 5.2, p. 19 for the de nition of balance). A value of 0.0 for add bal will consider only balanced partitions.

14

4 INTERFACE CODE: PARTY LIB.H AND LIBPARTY.A

recursive: integer, sets recursive or direct mode

A value of 0 refers to direct partitioning in the speci ed number of parts and the default value of 1 refers to recursive bisection (in this case the number of parts given by the parameter `-p' above is reduced to the next lower power of 2). Most partitioning methods in this library are originally developed for the bisection problem and possibly lack of performance if used for direct partitioning.

4.2.4 Result { Parameters All parameter so far were input parameters, i.e. they pass information to the procedure and their values will not be changed by the call. The parameters in this section are used to pass the results of the partitioning. These parameters are the only ones that will be changed by the called procedures.

part: pointer to an array of n integers, partitioning result

A partition of a graph is a function that gives each vertex the number of its part. Therefore, part is a pointer to an integer array. The space for the array (n integers) has to be allocated before the call of the procedure! When the procedure is nished, the array is lled with the part numbers of the according vertices. This array may then be used by the user as last step in his interface code to re ect the partitioning result to his own application.

cutsize: integer pointer, resulting number of cut edges.

The current cost function of a partition is the number of cut edges which has to be minimized. This value will be passed after completion of the procedure giving the user an idea of the quality of the calculated partition. Please note that if edge weights were provided, the sum of the weights of all cut edges will be passed.

4.2.5 Information { Parameters The following parameter is only used for additional information output about the performance of the procedures. Please note that additional information does not change the solution of the partition, but will result in slightly higher run time (due to an increasing amount of information work for monitoring and data tracing).

Output: integer, general amount of information output

The part parameter of the previous section returns the partitioning result, but usually the user would like to have some more information about the characteristics of the partition or wants to know more about the performance of the partitioning methods during run time. All output will be directed to stdout. The Output parameter allows values between 0 (no output) and 4 (maximum amount of output).

4.3 Single Partitioning Procedures The general partitioning procedure party lib is very useful to handle several partitioning methods at once, but some users might want to perform one speci c partitioning method only. Therefore, the user has access to the single partitioning procedures as described in the following. Most parameters used with these procedures are the same as in the previous section. For details, please refer to the description above.

4.3 Single Partitioning Procedures

15

int global_opt ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part, int *locked, int Output); int global_linear ( int n, float *vertex_w, int p, float add_bal, int *part); int global_scattered ( int n, float *vertex_w, int p, float add_bal, int *part); int global_random ( int n, float *vertex_w, int p, float add_bal, int *part); int global_gain ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part); int global_farhat ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part); int global_coordinate ( int n, float *vertex_w, float *x, float *y, float *z, int p, float add_bal, int *part); int global_file ( int n, char *filename, int p, int *part); #ifdef CHACO int global_multilevel ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part); int global_spectral_m ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part); int global_spectral_l ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part); int global_inertial ( int n, float *vertex_w, float *x, float *y, float *z, int p, float add_bal, int *part); #endif

4.3.1 Global Partitioning Procedures The following global partitioning procedures are implemented in the library. They refer to the methods described in Section 6.2 (except global le which allows to read a partition from a le). All global partitioning methods have some parameters to specify the graph (1st row) and some parameters to specify the partitioning problem and to pass the partitioning result (2nd row). The parameter locked in the procedure global opt is a pointer to an array of n integers and can be used to lock some vertices in a speci c part. This will reduce the solution space and the computational requirement. A value between 0 and p ? 1 for a vertex locks that vertex in the according part, whereas a value of ?1 addresses an unlocked vertex. Please note, that if you lock too many vertices, a balanced partition might not always be possible. According to option `-s' in Section 3.2, a calculated partition can be stored in a le. The procedure global le reads such a partition from a le and the name of the le has to be speci ed

16

4 INTERFACE CODE: PARTY LIB.H AND LIBPARTY.A

by the parameter lename. The last four procedures are only accessible if the Chaco library is available. As result, the procedures change the values in the integer array part and in case of success they return 0, otherwise 1.

4.3.2 Local Partitioning Procedures As local partitioning methods, PARTY serves the Kernighan-Lin (cf. Section 6.3.1) and the Helpful-Set heuristics (cf. Section 6.3.2). int local_kl ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part, int Output); int local_hs ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part, int Output); #ifdef CHACO int local_ckl ( int n, float *vertex_w, int *edge_p, int *edge, int *edge_w, int p, float add_bal, int *part); #endif

The local kl, local hs and local ckl (only accessible if the Chaco library is available) procedures need parameters for the graph (1st row), the partitioning problem and the partitioning result (2nd row) and the amount of information output (3rd row). Unlike the former procedures, local partitioning methods have to be provided with an already existing partition. Therefore, the parameter part is a pointer to an array of integers containing the part numbers of the vertices. Usually, the result of a single global partitioning procedure is used for this parameter. The changes in the local partitioning methods are described by changes of the values in the integer array part. Again, all procedures return 0 on success and 1 on failure.

4.4 Utility Procedures Several utility procedures are provided for an easier use of the procedures described so far. These utility procedures include I/O support, check and information facilities and some analysis of the memory and time usage. If not stated di erently, all procedures return 0 on success and 1 on failure.

4.4.1 I/O Procedures The procedures in this section take care of the transactions of data between the storage in a le and the storage in memory. The procedure graph load loads a graph from the les 'graph le' and 'xyz le' (the le format has to be the same as described in Section 3.3) and stores it into the following parameters (if no coordinates are provided, please pass NULL with the 'xyz le' parameter and the parameters x, y and z will be set to NULL). The following parameters correspond to the description in Section 4.2 with only one di erence: instead of the parameters themselves, pointers to them are passed. This is important, because the appropriate memory for the various arrays will be allocated within graph load. This dynamic allocated memory can be freed using the procedure graph free. The

4.4 Utility Procedures

17

int graph_load ( char *graphfile, char *xyzfile, int *n, float **vertex_w, float **x, float **y, float **z, int **edge_p, int **edge, int **edge_w); int graph_save ( char *graphfile, char *xyzfile, int n, float *vertex_w, float *x, float *y, float *z, int *edge_p, int *edge, int *edge_w); int graph_free ( int n, float *vertex_w, float *x, float *y, float *z, int *edge_p, int *edge, int *edge_w); int graph_print ( int n, float *vertex_w, float *x, float *y, float *z, int *edge_p, int *edge, int *edge_w); int part_save ( char *partfile, int n, int *part);

procedure graph save does the opposite of graph load: it stores a graph in les named 'graph le' and 'xyz le'. The procedure graph print prints the arrays storing the graph information on stdout. It can be used to check the right values of the graph parameters on the screen. To save a partition in a le, the procedure part save stores the partition array part in the le `part le'. The le will contain n lines, each of which with the part number (range [0..p-1]) of the according vertex.

4.4.2 Check and Information Procedures

The procedures in this section help to check the consistency of the data structures and to give some information about the graph and the calculated partition. int graph_check_and_info ( int n, float *vertex_w, int *edge_p, int *edge, int Output); int cut_size ( int n, int *edge_p, int int *part); int part_check ( int n, float *vertex_w, int *part, int Output); int part_info ( int n, float *vertex_w, int p, int *part, int Output);

float *x, float *y, float *z, int *edge_w,

*edge, int *edge_w,

int p, float add_bal, int recursive,

int *edge_p, int *edge, int *edge_w,

Again, the parameters used for these procedures are the same as described in Section 4.2. The procedure graph check and info checks the consistency of the graph data structure and puts error messages on stderr depending on the kind of error. The checks include wrong neighbor numbers, sel oops, double edges, single directed edges, di erent weights for the same edge and negative weights of vertices or edges. The parameter Output controls the amount of additional information output to stdout. A value of 0 for the parameter Output does not produce any information output (if no error occur), whereas a value of 1 produces some and a value of 2 produces more detailed output.

18

5 THE PARTITIONING PROBLEM

The procedure cut size calculates and returns the number of cut edges of the passed partition. If edge weights are provided, it returns the sum of edge weights of all cut edges instead. Please note that this procedure returns the cut size and not a value determining the success or failure of the procedure itself. The procedures part check checks the balance of a partition and produces an error statement if a part has a higher load than allowed. The parameter Output should be within [0..2]. A given partition will be analyzed by part info and detailed information will be printed on stdout. The amount of information, again, depends on the parameter Output (range [0..2]).

4.4.3 Memory and Time Procedures The eciency of a partitioning heuristic depends not only on the quality of the partition, but also on the memory- and time-requirements. void print_alloc_statistics (); void party_lib_times_start (); void party_lib_times_output ( int Times);

The procedure print alloc statistics produces some output to stdout showing the current and maximum amount of dynamically allocated memory. The procedure party lib times start initializes the time values. After invocation of any of the partitioning procedures, a call of the procedure party lib times output will produce output to stdout showing some information about the elapsed time. Again, the parameter Output controls the amount of information output on the screen (range [0..2]).

5 The Partitioning Problem The history of comparing di erent partitioning methods is as old as the methods themselves. But in many cases, comparisons are unfair because of di erent de nitions of the partitioning problem. This section tries to clarify the partitioning problem as viewed in PARTY and shows the major de nitions. In the following let G = (V; E ) be a graph with vertices V and undirected edges E . The vertices are listed from v0 to vjV j?1 . If di erent weights for vertices are considered, each vertex vi has a weight W (vi ) 2 IR+ . Otherwise, W (vi ) will bePset to 1.0 for all vertices. The weight is extended from a vertex to a set U  V by W (U ) = v2U W (v ). The same counts for di erent weights of edges: each edge has a weight W (fv; wg) 2 IN which is set to 1 if equal weights are considered.

De nition 5.1 (partition) Let

 : V ! f0; 1; :::p ? 1g

be a partition of a graph G that distributes the vertices among p parts V0, V1 ,...Vp?1.

The major characteristics of a partition are its balance and its cut size, which are de ned as follows.

19

De nition 5.2 (balance) Let

be the balance of  . It is generalized to

bal() := maxfjVij ? jVp j ; 0  i < pg bal() := maxfW (Vi) ? W (pV ) ; 0  i < pg

if unequal vertex weights are considered.

A low balance of a partition ensures an even distribution of the total vertex weight among all parts. A partition  is called a balanced partition if bal( ) < 1 (i.e.: jVij  d jVp j e for all 0  i < p). To generalize it to vertex weights, let max vertex weight = maxfW (vi ); 0  i < jV jg be the maximum weight of any vertex. It is easy to see that a partition with a balance less than max vertex weight exists for all graphs. Therefore, if bal( ) < max vertex weight,  is considered to be a balanced partition.

De nition 5.3 (cut size)

Let

be the cut size of  . It is generalized to

cut() := jffv; wg 2 E ; (v) 6= (w)gj cut() :=

if unequal edge weights are considered.

X

fv;wg2E ;(v)6=(w)

W (fv; wg)

The cut size is the number (sum of weights) of edges that are incident to vertices of di erent parts. The task of the partitioning problem is to nd a balanced partition  that minimizes the cut size. It is interesting, how many possible balanced partitions a graph may have and how dicult it is to compute the best possible partition. Let us consider the simple case in which a graph is to be partitioned into p = 2 parts and no weights are appointed to vertices or edges. This is called the bisection problem and it leads to the following de nition:

De nition 5.4 (bisection width) Let

bw(G) = minfcut();  is balanced bisection of Gg be the bisection width of graph G. The number of possible bisections of a graph is 12  jjVV jj = O(2jV j ) (if jV j is even) or 2 ? jV j  jV j?1 = O(2jV j ) (if jV j is odd). This exponential increase leads to a total number of only 126 2 di erent bisections for a graph with 10 vertices, but to about 5  1028 di erent bisections for a graph with 100 vertices. The problem of calculating the bisection width for an arbitrary graph is NP-complete (see e.g. [GJS76]). The bisection width is known for some regular graphs and can be computed for very small graphs with less than 100 vertices (using ecient enumeration schemes), but it is not practical to compute the bisection width for graphs with several thousand vertices. Therefore, heuristics are used to compute in adequate time a balanced partition with a cut size as low as possible. ?



20

6 PARTITIONING METHODS

6 Partitioning Methods

6.1 Classi cation: Global and Local Methods

The partitioning heuristics are usually divided into global and local methods. Global methods are sometimes called construction heuristics because they take the graph description as input and generate a balanced partition. Local methods are called improvement heuristics. They take the graph and a balanced partition as input and try to improve the partition. Figure 6 shows the combination of global and local methods. π1

V0

V1

Global Heuristic

cut size : balance :

Local Heuristic π2

π3

V1

V0

9 0

V1

V0

Local Heuristic

cut size : balance :

7 0

Local Heuristic

cut size : balance :

6 0

Figure 6: Combination of global and local methods Each partitioning algorithm rst applies a global heuristic to construct a partition 1 . The main task of a global heuristic is to force the partition to be balanced, while trying to cut through sparse areas of the graph. In the second step a local heuristic can be applied to construct a partition 2 from the partition 1. The main task of the local heuristic is to re ne the partition locally in order to obtain a lower cut size. The resulting partition has to be balanced, too, thus the local heuristic has to determine two equal sized sets of vertices in both parts of the cut. The exchange of those sets will result in the balanced partition 2. The same or a di erent local heuristic can be applied on a partition i to construct a further partition i+1 . The combination of global and local heuristics leads to the question: "Which heuristic should be focused on?" Very simple as well as highly complicated heuristics already exist for both steps. We will describe the methods implemented in PARTY in the next two sections.

6.2 Global Methods As mentioned in the previous section, each partitioning algorithm has to include a global method. The Optimal method produces the optimal result, but has an exponential run time behavior. The heuristic methods Linear, Scattered and Random are exclusively depending on the vertex position in the given vertex list, whereas the Gain and Farhat methods take the adjacency

6.2 Global Methods

21

information into account. Coordinate information are needed by the Coordinate Sorting method. The methods Multilevel, Spectral and Inertial are implemented in the Chaco library and can be used from the PARTY library through a build in interface. All partitioning methods implemented in PARTY are capable of considering unequal weights of vertices and edges. But for an easier understanding of the methods themselves, they will be described in the following sections without considering any weights.

6.2.1 Optimal The Optimal method searches the whole solution space of all possible balanced graph partitions using Branch&Bound and passes one partition with the lowest possible cut size. The time requirement is exponential and only very small graphs (less than 50 vertices) can be handled in appropriate time.

6.2.2 Linear Linear :

(vi ) = i DIV p

The parts of the partition will be assigned according to the numbering of the list of vertices. In the generation process of each graph - in some cases - vertices of dense areas of the graph are grouped together very closely in the list of vertices. In these cases, the linear partitioning method can succeed in obtaining a partition with a low cut size, but usually it results in partitions with high cut sizes, because it does not take notice of edges connecting the vertices. The main advantage of this method is that it is very simple and fast.

6.2.3 Scattered Scattered :

(vi) = i MOD p

The Scattered method distributes the vertices modulo to the parts of the partition. Like the linear, the scattered partitioning method, too, may produce good partitions for speci c types of graph and it, too, is very simple and fast. Generally, it produces partitions with a large cut, due to the lack of not considering any edges.

6.2.4 Random

The Random method randomly distributes the vertices from V among the parts. Starting with all parts being empty, the vertices are one after another assigned to a part by randomly choosing one which has less than d jVp j e vertices. The Random partitioning method produces partitions with a cut size of approximately (1 ? 1p )jE j, which is usually much higher than the lowest possible. Although this method by itself is not a good approach for the partitioning problem, one may use it as starting solution for ecient local partitioning methods.

6.2.5 Gain The Gain method (e.g. [Pre94]) is a simple greedy strategy based on adjacency information to construct a balanced partition. It starts with part Vp?1 holding the complete graph, the other parts being empty. At this point the cut size is 0. It then proceeds with lling all parts from V0 to Vp?2 one after another. Each part Vi is lled by repeatedly moving vertices from Vp?1 to Vi. For each move, a vertex from Vp?1 is chosen that increases the cut size least of all. After d jVp j e moves the part Vi is large enough and the moving process is changed to the following part Vi+1.

22

6 PARTITIONING METHODS

This strategy takes only slightly more time than the previous ones, but often produces partitions with reasonable cut sizes.

6.2.6 Farhat Similar to the previous method, the algorithm of Farhat ([Far88]), too, is a greedy approach. It starts with assigning a vertex with the minimum degree to V0 . It then assigns in breath- rst manner further vertices to V0 until V0 is of size d jVp j e. Then, another vertex of the remaining graph, which is adjacent to a vertex of V0 is taken as new seed for V1. V1 and all following parts are lled in the same way as V0. This strategy is reasonably fast and produces very compact parts resulting in quite low cut sizes.

6.2.7 Coordinate Sorting The Coordinate Sorting method (e.g. [FS93]) is only based on the vertex coordinates. It determines which of the x-, y- or z- coordinates have the widest range. The vertices are sorted according to this coordinate and the list is cut in linear fashion like in the Linear method (the graph is cut orthogonal to the axis of the according coordinate). Although this method does not consider any connectivity information of the graph, it tries to assign vertices which are close together in space to the same part. For many graphs of typical applications (like e.g. FEM-simulations), this results in reasonable cut sizes and the required time is dominated by the coordinate sorting which can be done very fast.

6.2.8 Multilevel The Multilevel method [HL95b] is implemented in the Chaco library ([HL94]) and an interface for its use is included in PARTY. It is based on a coarsening strategy that decreases the size of a graph in several levels using matching techniques. It then performs the Spectral method (described in the following section) on the smallest graph (the Spectral method has reasonable time and space requirements for small graphs). Finally, the graph is blown up again and the partition of the coarse graph is extrapolated to the original one. Such coarsening strategies are becoming very popular for graph partitioning and usually reduce the time-requirements while preserving the solution quality. Please refer to ([HL94]) and the literature in there for further details.

6.2.9 Spectral Spectral methods [BS94, HL95a, PSL90] are based on algebraic graph theory. A matrix similar to the adjacency matrix of the graph is constructed and some speci c eigenvectors of this matrix are determined, which is the major computational task of this method. The vertices are assigned to the parts according to their values in those eigenvectors. Spectral methods are very good in nding sparse areas of the graph to cut. The time- and space- requirements are quite high, but several approaches exist to reduce them. The Chaco library ([HL94]) o ers several variants of the Spectral method and the PARTY library includes interfaces to Chaco for use of two di erent variants: Spectral with a 'Multilevel RQI/Symmlq' and with a 'Lanczos' eigen solver. Again, please refer to [HL94] for further detail.

6.3 Local Methods

23

6.2.10 Inertial Like Coordinate Sorting, the Inertial method [FS93, HL94] (implemented in the Chaco library; an interface is included in PARTY) is based only on geometric information of the vertices. Instead of cuting the graph orthogonal to one of its axis, it is cut orthogonal to its largest elongation. Vertices of the graph are considered as point masses and the principle axis of this structure is likely to be a direction in which the graph is elongated. Again, please refer to ([HL94]) and the literature in there for further details.

6.3 Local Methods

Although a global partitioning method already produces a balanced partition, local methods try to improve it concerning the cut size. The potential for improvement depends on the di erence between the current cut size and the (unknown) best possible cut size. Most local partitioning methods were originally developed for improving graph bisections (due to the much simpler problem). The two local partitioning methods in PARTY are, too, implemented as local bisection methods. The following two sections describe the local partitioning methods as bisection methods and Section 6.3.3 shows how they are used to improve graph partitions with an arbitrary number of parts.

6.3.1 Kernighan-Lin The Kernighan-Lin heuristic [KL70] (KL) is the most frequently used local bisection method. It uses a sequence of logical vertex pair exchanges to determine the sets that have to be exchanged physically and needs a total of O(jV j3) steps in each pass. Fiduccia and Mattheyses [FM82] modi ed the KL-method and use a sequence of single vertex moves to determine the sets. They also use a very ecient data structure called Bucket, which reduces the number of steps for each pass to only O(jV j + jE j). The in uence of a move of a vertex v on the cut size of the bisection is of considerable importance and is pointed out by the de nition of the di -value.

De nition 6.1 di -value

The di -value of a vertex v is the di erence of the number of its external edges and the number of its internal edges:

di (v ) := jfw 2 V ; fv; wg 2 E;  (v ) 6=  (w)gj ? jfw 2 V ; fv; wg 2 E;  (v ) =  (w)gj: The value of di (v ) describes the decrease in cut size if v is moved to the other part and plays a major role in the KL-algorithm and all modi cations of it. Figure 7 shows the KL algorithm as modi ed for its use in PARTY. The REPEAT-loop invokes a pass of the algorithm. In each pass at rst the di -values of all vertices are computed. It then progresses by moving one unlocked vertex at a time to the other part. Both, the source part and the vertex, are chosen carefully. The source part is chosen in the following way. Consider the two directions in which vertices can be moved. If exactly one moving direction will result in an unbalanced bisection, then the other moving direction is taken. Otherwise, the part with the highest di -value of any remaining unlocked vertices is chosen (part 0 is taken in the possible case of a tie). Then, an unlocked vertex v 2 Vi with maximum di -value is taken and moved logically to the other part. This implies a change of di -values of its neighbors, which have to be updated. The move also results in a new bisection and the counter new steps counts the number of moved vertices without resulting in a balanced bisection with improved cut size. This is important for the termination of a pass. The original KL-algorithm modi ed by Fiduccia and Mattheyses terminates the pass only when all vertices

24

6 PARTITIONING METHODS REPEAT compute the di -values of all vertices and set new steps to 0; WHILE V0 and V1 have unlocked vertices and new steps < jV4 j choose part i 2 f0; 1g and an unlocked vertex v 2 Vi with di (v ) maximal; move v logically to the other part and lock; update the di -values of neighbors of v ; IF result is a balanced bisection with lowest cut size so far new steps = 0; ELSE new steps = new steps + 1; lay sequence up to the balanced bisection with lowest cut size physically over; UNTIL cut size is not improved Figure 7: Improving graph bisections based on the KL algorithm

are moved to the other part. Several people [HL94, Pre94] experienced that the nal balanced bisection with the lowest cut size usually occurs very early in a pass. To reduce the run time, a pass additionally terminates if no balanced bisection with an improved cut size could be achieved in the last jV4 j moves. After each pass, the change of the cut size throughout all moves is analyzed (cf. Figure 8). cut size

step cut size

0 0

cut size

x

x+n/4

min

Figure 8: Change of the cut size in one pass of KL Only the sequence of moved vertices up to the balanced bisection with the lowest cut size is physically moved to the other part, resulting in a new balanced bisection with an improved cut size. Further passes of the algorithm will be carried out on the resulting bisection until no improvement is made anymore. In general the KL method is very robust and reliable. The results are convincing, provided KL is started with a fairly satisfactory global bisection.

6.3.2 Helpful-Set Just as the KL algorithm, the Helpful-Set heuristic is based on local rearrangements. It, too, has to search for two sets of equal size (one in each part), which will improve the cut size if they both change the parts. The main di erence to KL is that it considers not only single vertices, but also whole sets, to take part in the exchange steps.

6.3 Local Methods

25

The Idea of Helpful Sets: The idea is to improve the cut size in several rounds, each of

which consists of 2 steps. In the rst step a set is searched for in one part and is moved over to the other part. Because the resulting bisection is not balanced, the second step searches for an equally weighted set on the over-weighted part and moves it to the under-weighted part. The sets are chosen very carefully to force an improvement of the cut size after their exchange. The de nition of the di -value of Section 6.3.1 is extended from a single vertex to a set of vertices:

De nition 6.2 (Helpful Set) Let S  Vi; i 2 f0; 1g be a subset of vertices from one part. X H (S ) = di (v) + 2  jffv; wg 2 E ; v; w 2 S gj v2S

is the helpfulness of S . S is called H (S )-helpful.

Examples of 2-helpful sets are shown in Figure 9 (vertices are marked with their di -values). 0

2

0

0

−2

−2

−2

0

1

−5

0

−2

Figure 9: 2-helpful sets. The helpfulness is a useful factor if a set S is to be moved to the other part. In this case the cut size of the bisection will be reduced by the helpfulness of the set S , i.e. the helpfulness plays the same role for a set as the di -value does for a single vertex. Each round of the heuristic starts with a given bisection 1 . In the rst step a search for a helpful set S (with H (S ) > 0) is conducted on one part of the bisection 1 and then S is moved over to the other part as shown in Figure 10. In this example a 2-helpful set S with 3 vertices was found in V0 and moved over to V1 . The cut size of 2 decreased from 9 to 7, but the balance went up to 3. For the second step a set is also searched but there are more restrictions on this search, which are speci ed in the following de nition.

De nition 6.3 (Balancing Set) Let S  Vi be a H (S )-helpful set. A set S  Vj [ S; j 6= i is called a balancing set of S , if jSj = jS j and S is at least (?H (S ) + 1)-helpful. Note that the cut size increases by not more than (H (S ) ? 1) if S is moved from one part of the cut to the other. The balancing set S of S guarantees that if it also changes the part, the resulting bisection becomes balanced and the cut size decreases by at least 1. In the example of Figure 10 a 0-helpful balancing set S is found with 3 vertices and is moved to V0 . The nal bisection 3 is balanced and the nal cut size is 7. So far, it has been assumed that in a round a helpful set S with H (S ) > 0 and its corresponding balancing set can be found. In this case the round is called a successful round because the cut size has been improved through a physical exchange of the two sets. Several further rounds can be applied to improve the cut size even more.

26

6 PARTITIONING METHODS Step 2

Step 1 π1 S

π2

π3 V1

V1

V1

V0

V0

V0

move S

move S

S H(S) = 2

cut size : balance :

H(S) = 0

9 0

cut size : balance :

7 3

cut size : balance :

7 0

Figure 10: One round with step 1 and 2 A round, where either no helpful set S with H (S ) > 0 in Step 1 or no balancing set S in Step 2 can be found, is called an unsuccessful round. In this case no physical exchanges occur and the bisection stays the same. The heuristic then tries to proceed with further rounds and weaker constraints on the sets. A new technique called Adaptive Limitation is used to control the constraints depending on the success of the previous round.

The Algorithm: Figure 11 shows the main algorithm. It basically consists of a WHILE-loop

with the two steps inside. The new technique Adaptive Limitation uses the value limit to control the search for helpful sets and the termination of the WHILE-loop. Limit is initialized with cut size=2 and works as a constraint for the search process in Step 1. Depending on a successful or unsuccessful search, limit is changed in each round. If it becomes 0, the algorithm terminates.

limit = cut size=2; WHILE limit > 0

Step 1: search for a helpful set S with H (S )  limit independently in both parts; IF no such set found IF any helpful set S with H (S ) > 0 found S = set with highest helpfulness found; limit = H (S ); ELSE S = ;; limit = 0; IF S 6= ; physically move S to the other part; Step 2: search for a balancing set S of S ; IF successful physically move S to the other part; limit = limit  2; ELSE physically move S back to its original part; limit = blimit=2c; Figure 11: The Helpful-Set algorithm.

6.3 Local Methods

27

In each round, the algorithm starts to perform Step 1: it searches for helpful sets in both parts of the bisection. This can be done independently in both parts. Then, if no helpful set S with H (S )  limit is found, the set with the highest helpfulness is taken and limit is set to this value. If no helpful set S with H (S ) > 0 is found, then S is set to ;, which prevents the algorithm from proceeding to Step 2 and limit is set to 0, which leads to a termination of the algorithm. If any helpful set S with H (S ) > 0 is found, the algorithm moves S from one part of the cut to the other, reducing the cut size by H (S ). It then starts to perform Step 2: it searches for a balancing set S. If such a set is found, the bisection is re-balanced and limit is increased, assuming that it may now be possible to nd a more helpful set in the next round. The net improvement of the cut size in each successful round of the algorithm (move of S , re-balance with S) is at least one edge. If the algorithm fails to re-balance, S is moved back to its origin and the value of limit is decreased. This assumes that a less helpful set will have a smaller size and be easier to balance. The number of rounds depends very much on the given problem and, in real applications, only a small number of rounds are performed. The central steps of the algorithm are searching for helpful and balancing sets. They are described in [DMP95, Pre94] in more detail.

6.3.3 Local Partitioning by Multiple Local Bisection The local bisection methods described in the previous section only work for p = 2. There are two di erent ways to extend them for partitioning of arbitrary p. The rst way is to extend the vertex exchange from between just two parts to an exchange between several parts. This implies several complex changes in the algorithms. Instead of one di -value, a vertex would have several di -values depending on external edges to di erent parts. In addition, sets of vertices would not only have to be exchanged between two parts, but also a cyclic exchange of sets would have to be considered. An easier way is to perform pairwise local optimization between di erent parts of the partition. In this approach, the simple local bisection algorithms can be used for each single local bisection. This leads to the question: Which pair of parts should be chosen for the next local bisection? Also, should a local bisection between two parts only be performed once or should it be repeated after possible other vertex changes in one of the two parts? At the current stage, PARTY performs local partitioning by applying local bisection on pairs of parts. It is obvious that a local bisection between two parts only needs to be performed if there is any edge connecting vertices from those two parts. Therefore, rst a list of all pairs of parts that are connected via at least one edge is constructed. As long as this list is not empty, a pair is removed, the local bisection is performed on it and some new pairs may be added to the list depending on the exchanged vertices. The local partitioning terminates when the list is empty. The central question is which pairs should be added to the list after a local bisection. Consider the stage after a local bisection between parts Vi and Vj (i 6= j ) and a vertex va that got exchanged (let va be moved from Vj to Vi ). If va is adjacent to a vertex vb 2 Vk which is not in one of the two participated parts (Vk 6= Vi and Vk 6= Vj ), a pair fVi; Vk g is constructed and added to the list of pairs if it is not already in it. This is repeated for all vertices exchanged between Vi and Vj . Note that if no improvement could be made in a local bisection, no vertices got exchanged and no further pairs are added to the list.

28

REFERENCES

7 Planned Future Extensions The primary goal of PARTY at this stage is to initiate a platform for an easy use of partitioning methods. Further partitioning methods will be included in the future, as well as di erent strategies combining several di erent methods. Lately, several coarsening strategies are published, trying to reduce the time requirements of the partitioning methods while preserving the solution quality (e.g. [HL94, KK95]). It is planed to extend the graph and partitioning problem de nitions. The future use of a cost function might be helpful to specialize the importance of cut size and load balance more precisely.

Acknowledgments We would like to thank Bruce Hendrickson, Robert Leland, David Pritchard, Jurgen Schulze and Carsten Spraner.

References [BS94] [DMP95]

[Far88] [FM82] [FS93] [GJS76] [HL94] [HL95a] [HL95b] [HM92]

S. T. Barnard and H. D. Simon. Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency: Practice and Experience, 6(2):101{117, 1994. R. Diekmann, B. Monien, and R. Preis. Using helpful sets to improve graph bisections. In Hsu, Rosenberg, and Sotteau, editors, Interconnection Networks and Mapping and Scheduling Parallel Computations, volume 21 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 57{73. American Mathematical Society, 1995. C. Farhat. A simple and ecient automatic fem domain decomposer. Computers & Structures, 28(5):579{602, 1988. C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In Proc. of the 19th IEEE Design Automation Conference, pages 175{181, 1982. C. Farhat and H. D. Simon. Top/domdec - a software tool for mesh partitioning and parallel processing. Technical Report RNR-93-011, NASA Ames Research Center, 1993. M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simpli ed np-complete graph problems. Theoretical Computer Science, 1:237{267, 1976. B. Hendrickson and R. Leland. The chaco user's guide: Version 2.0. Technical Report SAND94-2692, Sandia National Laboratories, Albuquerque, NM, Oct 1994. B. Hendrickson and R. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput., 16(2):452{469, 1995. B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. In Proc. Supercomputing '95. ACM, Dec 1995. J. Hromkovic and B. Monien. The bisection problem for graphs of degree 4 (con guring transputer systems). In Buchmann, Ganzinger, and Paul, editors, Festschrift zum 60. Geburtstag von Gunter Hotz, pages 215{234. B. G. Teubner, Stuttgart-Leipzig, 1992.

REFERENCES

29

[KK95] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical Report 95-035, Department of Computer Science, University of Minnesota, 1995. [KL70] B. W. Kernighan and S. Lin. An e ective heuristic procedure for partitioning graphs. The Bell Systems Technical Journal, pages 291{308, Feb 1970. [PR96] F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In Proceedings of HPCN'96, pages 493{498, Apr 1996. [Pre94] R. Preis. Ecient partitioning of very large graphs with the new and powerful helpfulset heuristic. Diplomarbeit, Universitat{GH Paderborn, Germany, Dec 1994. [PSL90] A. Pothen, H. D. Simon, and K. P. Liu. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal on Matrix Analysis and Applications, 11(3):430{452, 1990. [WCE95] C. Walshaw, M. Cross, and M. G. Everett. A localised algorithm for optimising unstructured mesh partitions. Int. J. Supercomputer Appl., 9(4):280{295, 1995.