A Parallel Multilevel Metaheuristic for Graph Partitioning - Springer Link

Journal of Heuristics, 10: 315–336, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

A Parallel Multilevel Metaheuristic for Graph Partitioning ˜ R. BANOS C. GIL∗ Departamento de Arquitectura de Computadores y Electronica, Universidad de Almeria, La Cañada de San Urbano s/n, 04120 Almeria, Spain email: [email protected] email: [email protected] J. ORTEGA Departamento de Arquitectura y Tecnologia de Computadores, Universidad de Granada, Campus de Fuentenueva, Granada, Spain email: [email protected] F.G. MONTOYA Departamento de Ingenieria Civil, Universidad de Granada, Campus de Fuentenueva, Granada, Spain email: [email protected] Submitted in September 2003 and accepted by Enrique Alba in March 2004 after 1 revision

Abstract One significant problem of optimisation which occurs in many scientific areas is that of graph partitioning. Several heuristics, which pertain to high quality partitions, have been put forward. Multilevel schemes can in fact improve the quality of the solutions. However, the size of the graphs is very large in many applications, making it impossible to effectively explore the search space. In these cases, parallel processing becomes a very useful tool overcoming this problem. In this paper, we propose a new parallel algorithm which uses a hybrid heuristic within a multilevel scheme. It is able to obtain very high quality partitions and improvement on those obtained by other algorithms previously put forward. Key Words: graph partitioning, parallel optimisation, multilevel optimisation, metaheuristic, simulated annealing, tabu search

The Graph Partitioning Problem (GPP) occurs in many areas. For example, VLSI design (Banerjee, 1994; Alpert and Kahng, 1995), test pattern generation (Klenke, Williams, and Aylor, 1992; Gil et al., 2002), data-mining (Mobasher et al., 1996), efficient storage of great data bases (SheKhar and DLiu, 1996), geographical information systems (Guo, Trinidad, and Smith, 2000), etc. The critical issue is finding a partition of the vertices of a graph into a given number of roughly equal parts, whilst ensuring that the number of edges connecting the vertices in different sub-graphs is minimised. As the problem is NP-complete (Garey ∗ Author

to whom all correspondence should be addressed.

˜ BANOS ET AL.

316

and Johnson, 1979), efficient procedures providing high quality solutions in a reasonable amount of time are very useful. Different strategies have been proposed to solve the GPP. A classification of graph partitioning algorithms is shown as follows. • Local vs. Global Methods: If the partitioning algorithm uses a previously obtained initial partition, it is called a local method (Kernighan and Lin, 1970). However, if the algorithm also obtain the initial partition, it is called global (Simon, 1991). • Geometric vs. Coordinate-free Methods: If the algorithm takes into account the spatial location of the vertices (Gilbert, Miller, and Teng, 1998), it is called geometric. Otherwise, if only connectivity among vertices is considered, it is called a coordinate-free method (Simon, 1991). • Static vs. Dynamic Partitioning: The static partitioning model divides the graph only once (Simon, 1991). Sometimes, due to the characteristics of the application where GPP occurs, the graph structure changes dynamically, thus making it necessary to repeatedly apply an optimisation. These algorithms (Schloegel, Karypis, and Kumar, 2001) are called dynamic. • Multilevel vs. Non-Multilevel Schemes: If the algorithm directly divides the target graph (Kernighan and Lin, 1970), it is called Non-Multilevel. Otherwise, if the graph is grouped several times, divided in the lowest level and then ungrouped up to the target graph, it is called Multilevel (Karypis and Kumar, 1998a). • Serial vs. Parallel Algorithms: The typical approach to solving the GPP is based on algorithms that run on a single processor (Karypis and Kumar, 1998a). These algorithms are called serial algorithms. In other cases, parallel processing is used either to speed up the serial version or to explore different areas of the search space (Karypis and Kumar, 1998b). In this paper, an efficient parallel multilevel algorithm for GPP is presented. This algorithm uses a multilevel scheme in the search process, which includes a metaheuristic based on mixing Simulated Annealing (SA) and Tabu Search (TS). Further, parallel processing is used to allow a cooperative and simultaneous search of the solution space. This results in a global free-coordinate parallel multilevel algorithm for static graph partitioning, whose partitions often improve those previously obtained by other algorithms. Section 1 provides a more precise definition of the GPP and describes the cost function used in the optimisation process. Section 2 describes the proposed metaheuristic to solve the problem. Section 3 provides the design of the multilevel scheme, including the metaheuristic as described in Section 2. Section 4 offers a detailed explanation of the parallelisation of the multilevel algorithm, while the analysis of the results obtained are provided in Section 5. Finally, Section 6 gives the conclusions and suggests areas for future work. 1.

The graph partitioning problem (GPP)

Given a graph G = (V, E), where V is the set of vertices, with |V | = n, and E the set of edges which determines the connectivity among the vertices, the GPP consists of dividing

A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING

Figure 1.

317

Two possible alternatives to divide a graph.

V into K balanced sub-graphs, V1 , V2 , . . . , VK , so that Vi ∩ V j = φ for all i = j; and K k=1 |Vk | = |V |. The balance condition is defined as the maximum sub-domain weight, S = max(|Vk |), for k = 1, . . . , K , divided by the perfect balance, n/K . If a defined imbalance, x%, is allowed, then the GPP try to find a partition such that the number of cuts is minimised subject to the constraint that S ≤ n/K ∗ ((100 + x)/100). Whenever the vertices and edges have weights, |v| denotes the weight of vertex v, and |e| denotes the weight of edge e. All the test graphs used to evaluate the quality of our algorithm have vertices and edges with weight equal to one (|v| = 1 and |e| = 1). However, our procedure is able to process graphs with any weight values. Figure 1 shows two possible partitions of a given graph. In figure 1(a) the graph is divided into two equally balanced sub-graphs, where the number of cuts is not minimised. On the other hand, figure 1(b) shows a partition with the optimal number of cuts. However, this partition does not fulfil the requirement of load balancing. This example clearly shows the conflict of objectives. In order to choose one or other, the following cost function can be considered: c(s) = α·ncuts(s) + β·

K

2imbalance(k) (s)

(1)

k=1

2.

The metaheuristic: Refined mixed simulated annealing and Tabu search (rMSATS)

In this section, we provide a description of the hybrid heuristic proposed in Gil, Ortega, and Montoya (2000). Next, we detail the two phases of rMSATS. 2.1.

Obtaining the initial partition of the graph

The first step to solve the GPP is to make an initial partitioning of the target graph. In this step rMSATS uses a procedure known as Graph Growing Algorithm (GGA) (Karypis

˜ BANOS ET AL.

318

and Kumar, 1998a). This algorithm starts from a randomly selected vertex, which is then assigned to the first sub-graph, as their adjacent vertices. This recursive process is repeated until this sub-graph reaches n/K vertices. From this point, the following visited vertices are assigned to a new sub-graph, and the process is repeated until all the vertices are assigned to their respective sub-graph. As the position of the initial vertex determines the structure of the primary partition, its random selection offers a very useful diversity. 2.2.

Optimisation by mixing simulated annealing and Tabu search

The application of an algorithm, such as GGA above, is insufficient to obtain a partition with adequate quality. Therefore, a refinement phase is required to effectively explore the search space. The problem with local search and hill climbing techniques is that the search may stop at local optima. In order to overcome this drawback, rMSATS uses Simulated Annealing (SA) (Kirkpatrick, Gelatt, and Vecchi, 1983) and Tabu Search (TS) (Glover and Laguna, 1993). The continued use of both heuristics results in a hybrid strategy and allows the search process to escape from local minima, while simultaneously preventing the occurrence of cycles. The concept of SA uses a variable called temperature, t, whose value diminishes in the successive iterations based on a factor termed Tfactor. The variable t is included within the Metropolis function and acts, simultaneously, as a control variable for the number of iterations of the algorithm, and as a probability factor for a definite solution to be accepted. The decrease of t implies a reduction in the probability of accepting movements which worsen the cost function. On the other hand, when the search space is explored, the use of a limited neighbourhood can cause the appearance of cycles. To avoid this problem, TS complements SA. Thus, when an adverse movement is accepted by the Metropolis function, the vertex is moved and included in the tabu list. In the current iteration, the set of vertices included in the tabu list cannot be moved. However, they are removed from the list at the end of the next one. Experimental outcomes (Gil, Ortega, and Montoya, 2000) indicate that the use of both techniques improves the results obtained when only SA or TS are applied individually. These experiments also obtain good results in comparison with other multilevel algorithms. In many cases, results obtained by rMSATS outperform the ones obtained by the METIS library (Karypis and Kumar, 1998a). Algorithm 1 shows the pseudo-code of the rMSATS procedure. Input parameters are required in the first step. Initial partition is obtained in the second step by using the GGA algorithm. The loop defined in Step 3 continues until the end condition is false, i.e., the number of iterations is smaller to R and t is larger to the established threshold. In each iteration of this loop, the boundary vertices are evaluated (Step 3.a). For each boundary vertex, rMSATS evaluates the cost of the movement of this vertex to the neighbouring sub-graph, as a function of the cost (Step 3.a.1), and the value of t. This movement is either accepted or rejected (Step 3.a.2). If the movement is accepted, then rMSATS test two things. Firstly, if the last movement implies a worsening of the cost function, then the vertex is added to the tabu list (Step 3.a.2.a). Further, after each movement is analysed, rMSATS verifies that the new solution is better than the previous one. In this case (Step 3.a.2.b), the solution is saved. Before starting with the next iteration, the temperature and iteration counter are updated (Step 3.a.3). Finally, in Step 4 the best solution found is returned.


319

Algorithm 1: refined Mixed Simulated Annealing and Tabu Search (rMSATS). 1) Input: graph, K, max imb, Ti, Tfactor, R; 2) Obtain initial partition of graph by applying GGA; t = Ti; 3) While (current iteration ≤ R) OR (t ≥ 0.1) 3.a) For each (boundary vertex v) do 3.a.1) cost = cost of the movement of v to the neighbour subgraph; 3.a.2) If Metropolis (cost, t) is accepted then perform movement (v, boundary subgraph); 3.a.2.a) If cost movement ≥ 0 then add v to the tabu list; 3.a.2.b) If current solution ≤ best solution then best solution = current solution; 3.a.3) t = t∗ Tfactor; current iteration = current iteration + 1; 4) Return the best solution;

3.

rMSATS within a multilevel scheme: MultiLevel rMSATS (MLrMSATS)

The Multilevel Paradigm has become an effective strategy for GPP. Currently, most of the algorithms for graph partitioning use a multilevel paradigm in combination with local search procedures, usually variants of the Kernighan and Lin algorithm (KL) (Kernighan and Lin, 1970). These algorithms (Karypis and Kumar, 1998a; Walshaw and Cross, 2000b) define the current state of the art trend. The Multilevel paradigm consists of three different phases. Figure 2 provides a visual description of this strategy. In the first phase, the vertices of the target graph are used to build clusters that will become the vertices of a new graph. By continuously repeating this process, graphs with fewer vertices will be created, until a sufficiently small graph is obtained. The second phase consist of making a first partition of that graph by using any of the existing procedures. However, the quality of this first partition is low, since the coarsening phase intrinsically implies a loss of accuracy. This is the reason to perform a third phase in which the graph is projected towards its original configuration, by applying simultaneously a refinement algorithm. In the following, the main characteristics of the multilevel version of rMSATS, MLrMSATS (Baños et al., 2003), are detailed. 3.1.

Coarsening phase

MLrMSATS uses a deterministic matching strategy to coarsen the graph. This HEavy Matching (HEM) strategy consists of matching the vertices according to the weight of the edges that connects them. The vertices not matched in this level are taken and matched with others not yet visited whose common edge has the highest weight. The advantage of this alternative is that the heaviest edges are hidden during this phase, and the resulting graph is, therefore built by light edges. Thus, the cuts obtained after the graph partitioning will

˜ BANOS ET AL.

320

Figure 2.

Multilevel paradigm.

tend to decrease. The coarsening process finishes when the number of vertices becomes less than the given threshold of z · K , where z is a parameter with a properly selected value (z = 15 in our experiments). Some studies (Karypis and Kumar, 1998a) have determined that HEM strategy obtains better solutions than others like Random Matching (RM). A problem, independent of the matching strategy used to coarse the graph, occurs when a vertex does not have any free neighbour with which it can be matched. These vertices will pass directly to the next level. This creates an imbalance in the weight of the vertices of the new graph. As the union of vertices is made by matching two vertices, this problem will get worse in the coarser levels. Thus, the weights of the vertices in the level i, G i , take values in the interval [1, 2i]. If all the vertices of the initial graph have weights equal to one, we will have a graph that is grouped in 10 levels. In the inferior level, G 10 , it is possible to have a vertex u with weight |u| = 1, and another v with weight |v| = 210 . This makes it difficult to have balanced sub-graphs, mainly in the coarsest levels, since the movement of very heavy vertices between neighbouring partitions cannot be accepted by the objective function. In order to solve this problem, MLrMSATS selects the vertices to be matched in ascending order of weights. The algorithm first tries to match vertices with lower weights, i.e., those that have been isolated in previous levels. 3.2.

Initial partitioning phase

After the coarsening phase, GGA is applied to obtain the first partition in the coarsest graph. GGA gives a total balanced initial partition, but the efficiency of the solution with reference


321

to the number of cuts is reduced. Thus, it is necessary to apply a final phase to improve the quality of the partition. 3.3.

Un-coarsening phase

It is necessary to apply a given technique to improve the quality of the initial partition. As previously stated, most algorithms use local search methods, usually variations of the Kernighan and Lin (1970) procedure. However, to improve the initial solution, MLrMSATS applies rMSATS in all the levels. In each one, the solution obtained in the previous level is optimised by using rMSATS with the initial annealing values. The final solution is obtained by repeating this process until the highest-level graph is reached. Algorithm 2: MultiLevel rMSATS (MLrMSATS). 1) Input: graph, K , max imb, Ti, Tfactor, R, z; 2) Coarse graph N levels by using HEM strategy in function of z and K ; 3) Obtain initial partition of graph by applying GGA; 4) For i = N to 1 repeat 4.a) Re-initialise parameters using input values; 4.b) While (current iteration ≤ R) do 4.b.1) Apply rMSATS(current iteration,max imb,t); 5) Return the best solution; Algorithm 2 formally defines MLrMSATS behaviour. Step 2 corresponds to the coarsening phase, where HEM is applied. In Step 3, GGA is used to obtain the initial partition, while the refinement process is performed in Step 4 using rMSATS heuristic. Results obtained by MLrMSATS (Baños et al., 2003) in most of the cases offer an improvement to rMSATS, and also to METIS library algorithms (Karypis and Kumar, 1998a). 4.

Parallel multilevel metaheuristic: Parallel multilevel simulated annealing and Tabu search (PMSATS)

In the previous section, the main characteristics of MLrMSATS have been described. This multilevel algorithm has been parallelised in order to explore the search space using several initial partitions and annealing parameters. Due to graph partitioning being a NP-complete problem and the large size of the graphs used in realistic situations, parallel processing is a very useful technique. Several metaheuristics, necessary to resolve the combinatorial optimisation problems, also need to be parallelised (Cung et al., 2001). Some of these parallel metaheuristics have been successfully applied to the GPP (Diekmann et al., 1996; Randall, 1999). On the other hand, some multilevel approaches have also been used in parallel (Karypis and Kumar, 1996; Walshaw and Cross, 2000b) and obtain very high quality partitions in a reasonable amount of time.

322

˜ BANOS ET AL.

The new algorithm, which we propose in this paper, is a parallelisation of the multilevel metaheuristic algorithm described in the previous section. Our parallelisation is not aimed to reduce the runtime of the serial version (MLrMSATS), but rather to optimise as much as possible, the quality of the partitions by taking advantage of the characteristics of both, the multilevel paradigm and the metaheuristic rMSATS. The idea consists of building a set of p solutions. Each solution applies MLrMSATS in a different way. The coarsening phase is performed using the HEM strategy. Then, each solution applies GGA, starting at a randomly chosen vertex. Thus, each solution starts the refinement phase with its own initial partition. In the refinement phase, all the p solutions uses its own annealing values, as we will describe later. In some iterations of the refinement phase, an elitist selection mechanism is applied in order to continue applying the search with the best solutions of the set and discarding the worse ones. Figure 3 provides a graphical comparison amongst rMSATS, MLrMSATS and PMSATS. Figure 3(a) corresponds to rMSATS, where only a solution is optimised, after firstly obtaining the initial partition by using GGA, and then, by using rMSATS in the optimisation phase. Figure 3(b) corresponds to MLrMSATS, where again a solution is optimised, but this time using the multilevel paradigm. Finally, figure 3(c) corresponds to PMSATS, where a population of p solutions is optimised simultaneously using parallel processing. Each solution applies HEM in the same way. In each solution, the initial partitioning is performed using GGA starting from a different initial vertex, thus obtaining different initial partitions before performing the refinement phase. Finally, the refinement phase is independently carried out for each solution by applying rMSATS with their own annealing values (different from the others), although in same iterations there are interactions between the solutions thus enabling an elitist selection. Then, PMSATS continues improving the quality of the best solutions and discarding the worst ones. The main characteristics of PMSATS are detailed as follows: • Each solution applies GGA to the coarsest graph starting from a vertex chosen by random interval selection, as we detail: Let p be the number of solutions, then solution Pi , i = 1 . . . p, makes the first partition starting from a vertex randomly chosen in the interval [( i−1 · n) + 1, ( pi · n) + 1[. This strategy assures diversity in the initial partitions, even in p the case of irregular graphs. • In order to accommodate the effect of modifying the annealing parameters, each solution uses a different initial temperature as the identifier, i, as we explain later. An interval of initial temperatures [Ti Min, Ti Max], and a fixed number of iterations, R, are established. The solution P1 starts in Ti Min, the solution Pp starts in Ti Max, and the others are equally distributed along this interval. Then, Tfactor is computed for each solution in function of R and its initial temperature. In figure 4, a clear example of this strategy is shown. Here, an interval of initial temperatures, Ti = [150, 50], and a fixed number of iterations, R = 1000, has been established. The solution A has the highest value of Ti , and Tfactor is low, thus determining a fast decrease in the temperature. On the other hand, solution J has the lowest value of Ti , and Tfactor is high, determining a slow decrease in the temperature. With this strategy, our algorithm also provides a fair distribution of the work load thus avoiding the loss of efficiency that would occur if each process randomly chose the value of Ti and Tfactor.


Figure 3.

323

Graphical description of rMSATS (a), MLrMSATS (b), and PMSATS (c).

• PMSATS uses an elitist selection mechanism to best utilise the computational resources in the optimisation of the best solutions obtained. In some iterations at the refinement phase, the quality of the solutions is competitively compared. In a given iteration, if w solutions (w ≤ p) are evaluated, the winner of the tournament sends its current solution to the others, which continue to apply the refinement process using this new solution, and together with their own annealing values. By using this elitist strategy, the algorithm continues working on the best solutions with different annealing parameters, instead of exploring other less efficient solutions. To pursue this idea, we need to resolve two different questions regard- ing the method of selection required to do it. The first question requires us to determine the best way to perform the migration amongst processors.

˜ BANOS ET AL.

324

Figure 4.

Parallel temperature variation using different values of Ti and Tfactor.

Figure 5.

Migration strategies used by PMSATS: (a) STR1 and (b) STR2.

We have used two different strategies, with this purpose in mind. The first one, STR1, (figure 5(a)) is based on specific communication between the solution Pi and its close neighbours, i.e., with Pi−1 and Pi+1 alternatively. The second strategy, called STR2, (figure 5(b)) is based on broadcasting the best solution of the set to the rest of the processors, and continuing the search with this new solution. The second question requires us to determine the optimum migration frequency. Two different alternatives have been implemented, taking into account the characteristics of the heuristic within the multilevel scheme. The first one, M1, consists of migrating the solutions after each refinement step. After applying rMSATS in the current level and before it is projected to the upper


325

one, the solutions are selected and migrated by using one of the proposed strategies. The second alternative, M2, allows communication to be held only in the highest level, described as follows: Let C be the number of communications and R the number of refinement iterations. Then, the communications are performed in the target graph only at the iterations { CR , 2· CR , . . . , C· CR }. This strategy makes an independent refinement of the solutions at all levels possible, except in the final level, where the solutions are better. In order to compare this strategy with M1, the value of C is set equal to the number of refinement levels. Thus, both alternatives have the same amount of communication.

Algorithm 3 presents the PMSATS procedure. In the first step, the input parameters are acquired. Step 2 consists of coarsening the target graph N levels by using the HEM strategy. N depends on z and K as previously stated. Step 3 calculates the annealing parameters for use in the optimisation process. Step 4 obtains an initial solution (initial partition) of the graph by applying GGA using random interval selection in function of the solution’s identifier. Step 5 is repeated in each one of the un-coarsening iterations until the target graph (graph of level 1) is finally optimised. In order to apply the new optimisation loop in the current level, Step 5.a initialises the annealing parameters with the calculus of Step 3. Step 5.b controls the number of iterations of rMSATS refinement process. In each cycle of the loop, the rMSATS algorithm is applied with its own parameters and variables. The elitist selection is performed using the input migration strategy whereby the selected migration frequency is M2, the current level is 1 (finest level), and the current iteration is one of the iterations where communication has been established. Once Step 5.b. has finished, and only if the selected migration frequency is M1, Step 5.c allows the communication amongst the processors. Then, the graph is ungrouped (Step 5); unless the last optimisation had been done over the graph of level 1, i.e. the target graph. In this case, all the solutions are sent to the master processor (Step 6), which returns the best of all the received solutions (Step 7). Algorithm 3: Parallel Multilevel Simulated Annealing and Tabu Search (PMSATS). 1) Input: graph, K, max imb, z, p, Ti min, Ti max, R, mig strat, mig freq; 2) Coarse graph N levels by using HEM in function of z and K ; 3) Determine Ti and Tfactor for this solution; t = Ti; 4) Obtain initial partition of graph by applying GGA; 5) For i = N to 1 repeat 5.a) Re-initialise parameters using input values; 5.b) While (current iteration ≤ R) do 5.b.1) Apply rMSATS(current iteration, max imb, t); 5.b.2) If mig freq == M2 and current level == 1 and current iteration is a communication iteration then Elitist Tournament Selection(mig strat); 5.c) If mig freq == M1 then Elitist Tournament Selection(mig strat); 6) Send best solution to the master; 7) Return best solution received of all the slaves;

˜ BANOS ET AL.

326 5.

Experimental results

The PMSATS executions were performed by using a cluster of twelve dual Intel Xeon 2.4 GHz processors. The test graphs used have different sizes and topologies. These graphs belong to a public domain set frequently used to compare and evaluate graph-partitioning algorithms. Table 1 briefly describes them: number of vertices, number of edges, maximum connectivity (max) (number of neighbours to the vertex with the highest neighbourhood), minimum connectivity (min), average connectivity (avg) and file size. Table 1.

Set of test graphs used to evaluate the experimental results.

Graph

|V|

add20

2395

data 3elt

|E|

min

max

avg

7462

1

123

6.23

63

2851

15093

3

17

10.59

140

4720

13722

3

9

5.81

136

uk

4824

6837

1

3

2.83

70

add32

4960

9462

1

31

3.82

90

whitaker3

File size (KB)

9800

28989

3

8

5.92

294

crack

10240

30380

3

9

5.93

297

wing nodal

10937

75488

5

28

13.80

768

fe 4elt2

11143

32818

3

12

5.89

341

vibrobox

12328

165250

8

120

26.81

1679

bcsstk29

13992

302748

4

70

43.27

1679

4elt

15606

45878

3

10

5.88

501

fe sphere

16386

49152

4

6

6.00

540

cti

16840

48232

3

6

5.73

532

memplus

17758

54196

1

573

6.10

536

cs4

22499

43858

2

4

3.90

506

bcsstk30

28924

1007284

3

218

69.65

11403

bcsstk31

35588

572914

1

188

32.2

6547

bcsstk32

44609

985046

1

215

44.16

11368

t60k

60005

89440

2

3

2.98

1100

wing

62032

121544

2

4

3.92

1482

brack2

62631

366559

3

32

11.71

4358

finan512

74752

261120

2

54

6.99

3128

fe tooth

78136

452591

3

39

11.58

5413

99617

662431

5

125

13.3

7894

110971

741934

5

26

13.37

9030

fe rotor 598a fe ocean

143437

409593

1

6

5.71

5242

wave

156317

1059331

3

44

13.55

13479

m14b

214765

1679018

4

40

15.64

21996


327

These test graphs, together with the best known solutions for them, can be found in Walshaw’s Graph Partitioning Archive (Graph Partitioning Archive (2003)). These solutions indicate the number of cuts classified by levels of imbalance (0%, 1%, 3% and 5%). Thus, the reduction in the number of cuts is considered as an objective, while the imbalance degree (less than 5% in our experiments) is considered as a restriction. Under these conditions, the cost function described in (1), has parameters α = 1, and β = 0; with imbalance(k) ≤ 5 for all k in the interval [1, K ].

5.1.

Parameter setting

Figure 6 shows the results obtained by PMSATS by applying GGA, using 1, 5 and 20 different solutions, with the random interval selection for the initial vertex previously explained. All the solutions use the same temperature values, Ti = 100 and Tfactor = 0.995. With these parameters, the number of iterations of the algorithm is set to 1500. As we can see, the use of different initial partitions offers a diversity, which helps to improve the solution by using fewer partitions. Once it has been shown that, by using more solutions and by applying GGA from different initial vertices, the quality of the solutions often improves. We can analyse the performance of the algorithm when the number of solutions and iterations has been modified. Figure 7 compares the application of PMSATS with p = 10, R = 3000; p = 20, R = 1500; and p = 40, R = 750. Here, the annealing values are T i = 100 and Tfactor = 0.995. As it can be seen in this figure, the best configuration corresponds to p = 20, R = 1500. The advantage of using more solutions comes from the improvement in the diversity of the

Figure 6.

Effect of applying PMSATS with different number of initial partitions.

˜ BANOS ET AL.

328

Figure 7.

Effect of modifying the number of solutions and iterations.

searching process. However, the high complexity of the search requires the application of rMSATS during many iterations. Thus, the selected parameters are p = 20, and R = 1500. The determination of the interval of initial temperatures in the annealing process poses a problem. Neither the adequate size of the range nor its extreme values are known. The number of solutions ( p) is also an important factor, because whenever the value of p increases it is supposed that the range should also be increased. Furthermore, the irregularity of the test graphs increases the dificulty of selecting adequate values for these parameters. For these reasons, the selection of an optimal interval becomes another NP-complete problem. To resolve this issue, we have applied the algorithm using a recursive division strategy. The idea consists of selecting a very large initial range, for example Ti∗ = [500, 2]. This interval is recursively divided until none of the sub-intervals of a certain level improve the solutions of one of the greater sub-intervals. This interval is then selected as the adequate interval. Figure 8 shows the average number of cuts obtained by each interval after applying the algorithm over the test graphs for some values of K (K = {4, 16, 64}). For a number of p = 20 solutions, none of the average number of cuts for the four smallest sub-intervals is lower than the one obtained in the interval Ti∗ = [500, 250], which obtains the best average result. Therefore, we selected this interval for next experiments. Besides the improvement obtained by using different initial vertices in GGA and different annealing parameters, we have found an adequate range of values for Ti. Next, we determine the performance of the algorithm with respect to communication by using the migration strategies described in Section 4. In figure 9 the results obtained for these migration strategies are shown and the variation of the migration frequencies, is also analysed. In these executions, a population of p = 20 solutions and R = 1500 iterations has been considered, using the range of initial temperatures previously selected, Ti∗ = [500, 250].


Figure 8.

Performance of PMSATS considering different temperature ranges.

Figure 9.

Performance of PMSATS by using different migration strategies and migration frequencies.

329

The results obtained indicate that, in most cases, STR1 improves the average number of cuts with respect to the STR2. The reason for this behaviour is that by sensible broadcasting breaks, the independence of the search done by each solution is improved. With respect to the migration frequency, results indicate that it is better to use communication only in the finest level rather than performing migration at each change of level. This behaviour is

˜ BANOS ET AL.

330

similar to the effect of broadcasting migration, i.e., the migration in each change of level breaks the independence of the searching process. 5.2.

PMSATS versus the previous approaches

Using Ti = 100 and Tfactor = 0.995 as annealing parameters, PMSATS executions obtain very high quality partitions in comparison with rMSATS and MLrMSATS. For the subset of test graphs used in the previous comparison, PMSATS obtains better solutions in all of executions than rMSATS (Gil et al., 2002). PMSATS is also more efficient than MLrMSATS (Baños et al., 2003). In 90% of executions PMSATS, improves the results of MLrMSATS, while the same partition is obtained in 6% of them. 5.3.

PMSATS versus other public domain packages

In this section, we compare PMSATS against other formerly proposed algorithms. As we commented previously, PMSATS uses parallel processing to perform the multilevel algorithm (MLrMSATS). We therefore compared PMSATS with two versions of JOSTLE, a high quality multilevel graph-partitioning algorithm. Further, we compare PMSATS against ParMETIS, a powerful parallel graph partitioning library. With reference to JOSTLE, we compare PMSATS against two versions of JOSTLE, JOSTLE Evolutionary, and iterated JOSTLE. JOSTLE models use a multilevel paradigm approach. The principal characteristic of JOSTLE is that it applies a variant of the KL method (Kernighan and Lin, 1970) procedure during the refinement phase, by using two different bucket sorting structures to perform the movements of boundary vertices in function of their gains. The basic idea of JOSTLE Evolutionary (JE) (Soper, 2000) is that each vertex is assigned a bias greater than or equal to zero, depending of its position in reference to the boundary, and each edge a dependent weight, being a sum of its end vertices. With these values, in the coarsening phase of JOSTLE, the edges with the highest weights are matched first (as HEM works) and when performing the refinement phase, vertex gains are calculated using the biased edge weights. The effect is that vertices with small bias are more likely to appear at the boundary of a sub-domain rather than those with a large bias, and edges with lower weights are more likely to become cut edges rather than those with higher weight. The evolutionary scheme is based on obtaining successive offspring, from the evolutionary search, whose crossover and mutation operators are dependent on the biases of the individuals of each generation. On the other hand, we have also compared PMSATS versus iterated JOSTLE (iJ) (Walshaw, 2001). iJ is based on the alternative application of the multilevel algorithm JOSTLE. In each iteration of iJ, the multilevel process is performed using information from the previous iteration, improving the previous solution. The iterative process finishes when after a given number of iterations there is no further improvement to the quality of the solution. Finally, we also compare the results of PMSATS versus ParMETIS. ParMETIS (ParMETIS, 2003) is an MPI-based parallel library which implements a variety of algorithms for the partitioning of unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS (Karypis


331

and Kumar, 1998a) and includes routines which are especially suited for parallel AMR computations and large scale numerical simulations. The set of algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph partitioning algorithms described in Karypis and Kumar (1996), the adaptive repartitioning algorithm described in Schloegel, Karypis, and Kumar (2000b), and the parallel multi-constrained algorithms described in Schloegel, Karypis, and Kumar (2000a). Thus, we compare PMSATS against the parallel multilevel k-way graph partitioning algorithm provided by ParMETIS (ParMETIS v.3.1.0). Tables 2 and 3 show the best results obtained by PMSATS versus ParMETIS library (ParMETIS 2003), and also with the best known solutions obtained by other algorithms (Graph Partitioning Archive (2003)), over all the test graphs included in Table 1, and with an imbalance of less than 5%. In comparison with ParMETIS, PMSATS obtains better results in 95% of cases, whilst ParMETIS only proves to be better in 1% of executions. In the other 4% of the cases, PMSATS and ParMETIS obtain the same partition. On the other hand, PMSATS obtains better results than the previously best known solutions in 40% of executions, and equals the best known solutions in 12% of cases. Most of these best known solutions included in the Graph Partitioning Archive have been obtained by JE and iJ. The run-times of PMSATS executions can vary from few seconds to several hours, dependent on the graph. In comparison with ParMETIS, the run-times of PMSATS are larger by approximately two orders of magnitude. Nevertheless, run-times of JE and iJ are larger to PMSATS (e.g. run-times of JE for large graphs are of several days). Figure 10 shows the number of the best known solutions obtained by each one of the algorithms included in Graph Partitioning Archive (2003), with imbalance of less than 5%, over all the test graphs described in Table 1 for K = {2, 4, 8, 16, 32, 64}. If two or more different algorithms obtain the same number of cuts for a certain graph and the value of

Figure 10. Archive.

Number of best known solutions found by each algorithm included in Walshaw’s Graph Partitioning

˜ BANOS ET AL.

332

Table 2.

Comparison of PMSATS, ParMETIS and best know solutions with imbalance less than 5%. (K = 4) cuts

(K = 8) cuts

(K = 32) cuts

(K = 64) cuts

638 778 618

1184 1327 1184

1709 2053 1705

2107 3060 2186

2583 3790 2785

3182 3927 3266

PMSATS ParMETIS Graph Part. Archive

189 225 196

391 468 378

681 871 702

1147 1441 1195

1815 4415 1922

2803 7668 2911

3elt


87 108 87

199 266 199

336 448 334

567 884 566

956 3328 958

1535 4178 1552

uk


21 24 18

47 75 41

93 147 82

166 334 154

288 527 265

466 795 436

add32


10 10 10

36 33 33

67 309 69

150 462 117

246 677 212

597 1130 624

whitaker3


126 132 126

381 489 380

658 862 658

1100 1404 1092

1698 5425 1686

2544 6110 2535

crack


182 209 183

361 412 360

676 845 676

1083 1296 1082

1690 1988 1679

2540 2883 2590

wing nodal


1669 1908 1970

3564 3887 3566

5378 5929 5387

8332 9164 8316

11814 12787 12024

15789 17374 16102

fe 4elt2


130 130 130

349 392 349

601 710 597

1012 1143 1007

1641 1831 1651

2520 2796 2516

vibrobox


10630 12802 10310

20050 21239 19245

24338 28701 24158

33460 37882 31695

41356 45311 41176

47149 51552 50757

bcsstk29


2818 2958 2818

8388 9617 8088

15047 18840 15314

23235 27456 24706

34843 41680 36731

56120 60938 58108

4elt


137 163 137

322 387 319

532 652 527

939 1103 916

1554 1835 1537

2583 2938 2581

fe sphere


384 458 384

776 906 766

1193 1395 1152

1719 2081 1692

2575 2966 2477

3623 4142 3547

cti


318 459 318

889 1104 917

1727 2212 1716

2781 3452 2778

4034 4963 4236

5738 6753 5907

Graph

Algorithm

add20


data

(K = 2) cuts

(K = 16) cuts

A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING Table 3.

333

Comparison of PMSATS, ParMETIS and best know solutions with imbalance less than 5% (cont.). (K = 2) cuts

(K = 4) cuts

(K = 8) cuts

(K = 16) cuts

(K = 32) cuts


5333 6143 5353

9393 10511 9427

11883 12703 11939

13939 15348 13279

15380 19636 14384

16761 21338 17409

cs4


361 435 363

979 1207 936

1535 1877 1472

2236 2704 2126

3210 3654 3080

4317 4915 4196

bcsstk30


6251 6676 6251

16602 17376 16617

35626 39360 34559

80309 79481 70768

119945 126567 117232

176287 186779 177379

bcsstk31


2676 2749 2676

8177 8139 7879

14791 15113 13561

26196 26086 24179

41106 42827 38572

59155 63383 60446

bcsstk32


5049 6105 4667

12417 12171 9728

23456 26564 21307

40627 43157 38320

67475 71287 62961

101501 102281 96168

t60k


69 96 72

211 247 211

483 545 467

889 1021 852

1473 1663 1420

2322 2507 2221

wing


787 995 778

1703 1989 1636

2664 3169 2551

4170 4975 4015

5980 7113 6010

8328 9536 8161

brack2


660 783 668

2749 3284 2808

7156 7988 7080

11858 13440 11958

18005 20717 17954

25929 29677 26944

finan512


162 162 162

324 324 324

648 648 648

1620 1296 1296

2592 2592 2592

17681 11956 10821

fe tooth


3839 4416 3982

6942 8383 7152

11568 13566 12646

17771 20510 18435

25528 28497 26016

34795 39591 36030

fe rotor


1956 2238 1974

7757 8242 8097

13651 14838 13184

20674 23548 20773

32616 36769 33686

46366 52236 47852

598a


2336 2555 2339

8024 8646 7978

15685 17441 16031

25775 28564 26257

39098 44099 40179

56883 63516 58307

fe ocean


312 557 311

1805 2504 1704

4548 6722 4019

8060 12371 7838

13007 19091 12746

20709 27264 21784

wave


8610 9847 8868

16681 19028 18058

29292 34945 30583

43029 48716 44625

62585 69916 63725

84419 94018 88383

m14b


3842 4219 3866

13401 14607 14013

27468 28689 27711

43501 49184 44174

66942 74007 68468

97143 108724 101385

Graph

Algorithm

memplus

(K = 64) cuts

˜ BANOS ET AL.

334

K , then the winner is the algorithm that hast the most balanced partition. If two or more algorithms obtain the same number of cuts, and the imbalance is also the same, then the first algorithm to find the solution is considered the winner. As we can see in the figure 10, PMSATS is clearly the best algorithm when the number of optimal solutions is considered. In some cases, PMSATS also provides the same best-known partition. JE and iJ also obtain an important number of recognised solutions. The rest of the algorithms only obtain better results in a smaller number of cases. 6.

Conclusions

In this paper, we present a new parallel multilevel metaheuristic algorithm for static graph partitioning. This parallel algorithm uses a multilevel scheme that includes a hybrid metaheuristic which mixes Simulated Annealing and Tabu Search along the search process. The inclusion of this hybrid metaheuristic within the multilevel scheme, in many cases, outperforms other multilevel approaches based on the use of the KL algorithm or other variants. The parallel implementation is focused upon improving the quality of the partitions as much as possible. For this purpose, the parallel algorithm simultaneously optimises several solutions. Each solution evolves independently applying the multilevel paradigm with different annealing parameters. Eventually, during the refinement phase, an elitist selection mechanism is used in order to best utilise the computational resources in the search for the best solutions. The first conclusion derived from the results is that the diversity of the initial partitions is essential in the search process. The selection of the best parameters for the hybrid heuristic is made dificult by the characteristics of the problem, and requires different values in parallel. The consequences of modifying the number of solutions and the number of iterations in the refinement phase of the multilevel algorithm have also been analysed. Furthermore, several migration strategies have been considered, using different migration frequencies. As a result of this analysis, we have designed a robust parallel multilevel metaheuristic algorithm for graph partitioning whose solutions, in most cases, improve or is equal to, those obtained by other previously proposed efficient algorithms. Acknowledgments The authors would like to thank anonymous referees for their helpful comments. This work was supported by project TIC2002-00228 (CICYT, Spain). References Alpert, C.J. and A. Kahng. (1995). “Recent Developments in Netlist Partitioning: A Survey.” Integration: the VLSI Journal 19(1/2), 1–81. Banerjee, P. (1994). Parallel Algorithms for VLSI Computer Aided Design. Prentice Hall: Englewood Cliffs, New Jersey. Baños, R., C. Gil, J. Ortega, and F.G. Montoya. (2003). “Multilevel Heuristic Algorithm for Graph Partitioning.” In Proceedings Third European Workshop on Evolutionary Computation in Combinatorial Optimization. SpringerVerlag, LNCS 2611, pp. 143–153.


335

Cung, V.D., S.L. Martins, C.C. Ribeiro, and C. Roucairol. (2001). “Strategies for the Parallel Implementation of Metaheuristics.” In C.C. Ribeiro and P. Hansen (eds.), Essays and Surveys in Metaheuristics. Kluwer, pp. 263– 308. Diekmann, R., R. Luling, B. Monien, and C. Spraner. (1996). “Combining Helpful Sets and Parallel Simulated Annealing for the Graph-Partitioning Problem.” Parallel Algorithms and Applications 8, 61–84. Garey, M.R. and D.S. Johnson. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: W.H. Freeman & Company. Gil, C., J. Ortega, and M.G. Montoya. (2000). “Parallel VLSI Test in a Shared Memory Multiprocessors.” Concurrency: Practice and Experience 12(5), 311–326. Gil, C., J. Ortega, M.G. Montoya, and R. Baños. (2002). “A Mixed Heuristic for Circuit Partitioning.” Computational Optimization and Applications Journal 23(3), 321–340. Gilbert, J., G. Miller, and S. Teng. (1998). “Geometric Mesh Partitioning: Implementation and Experiments.” SIAM Journal on Scientific Computing 19(6), 2091–2110. Glover, F. and M. Laguna. (1993). “Tabu Search.” In C.R. Reeves (ed.), Modern Heuristic Techniques for Combinatorial Problems. London: Blackwell, pp. 70–150. Graph Partitioning Archive. (2003). http://www.gre.ac.uk/∼c.walshaw/partition/. URL time: August 31th, 2003, 2345 . Guo, J., G. Trinidad, and N. Smith. (2000). “MOZART: A Multi-Objective Zoning and AggRegation Tool.” In Proceedings 1st Philippine Computing Science Congress. pp. 197–201. Karypis, G. and V. Kumar. (1996). “Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs.” Technical Report TR 96-036, Dept. of Computer Science, University of Minnesota, Minneapolis. Karypis, G. and V. Kumar. (1998). “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs.” SIAM Journal on Scientific Computing 20(1), 359–392. Karypis, G. and V. Kumar. (1998). “A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering.” Journal of Parallel and Distributed Computing 48(1), 71–95. Kernighan, B.W. and S. Lin. (1970). “An Efficient Heuristic Procedure for Partitioning Graphs.” Bell Systems Technical Journal 49(2), 291–307. Kirkpatrick, S., C.D. Gelatt, and M.P. Vecchi. (1983). “Optimization by Simulated Annealing.” Science 220(4598), 671–680. Klenke, R.H., R.D. Williams, and J.H. Aylor. (1992). “Parallel-Processing Techniques for Automatic Test Pattern Generation.” IEEE Computer 25(1), 71–84. Mobasher, B., N. Jain, E.H. Han, and J. Srivastava. (1996). “Web Mining : Pattern Discovery from World Wide Web Transactions.” Technical Report TR-96-050, Department of computer science, University of Minnesota, Minneapolis. ParMETIS. (2003). http://www-users.cs.umn.edu/∼karypis/metis/parmetis/index.html. URL time: September 1st, 2003, 0020 . Randall, M. and A. Abramson. (1999). “A Parallel Tabu Search Algorithm for Combinatorial Optimisation Problems.” In Proceedings of the 6th Australasian Conference on Parallel and Real Time Systems. Springer-Verlag, pp. 68–79. Schloegel, K., G. Karypis, and V. Kumar. (2000). “Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning.” In Proceedings of 6th International Euro-Par Conference. Springer-Verlag, LNCS 1900, pp. 296– 310. Schloegel, K., G. Karypis, and V. Kumar. (2000). “A Unified Algorithm for Load-balancing Adaptive Scientific Simulations.” In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. Schloegel, K., G. Karypis, and V. Kumar. (2001). “Wavefront Diffusion and LMSR: Algorithms for Dynamic Repartitioning of Adaptive Meshes.” IEEE Transactions on Parallel and Distributed Systems 12(5), 451–466. Shekhar, S. and D.R. DLiu. (1996). “Partitioning Similarity Graphs: A Framework for Declustering Problems.” Information Systems Journal 21(6), 475–496. Simon, H.D. (1991). “Partitioning of Unstructured Problems for Parallel Processing.” Computing Systems in Engineering 2–3(2), 135–148. Soper, A.J., C. Walshaw, and M. Cross. (2000). “A Combined Evolutionary Search and Multilevel Optimisation Approach to Graph Partitioning.” In Proceedings of the Genetic and Evolutionary Computation Conference. pp. 674–681.

336

˜ BANOS ET AL.

Walshaw, C. and M. Cross. (2000). “Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm.” SIAM Journal on Scientific Computing 22(1), 63–80. Walshaw, C. and M. Cross. (2000). “Parallel Optimisation Algorithms for Multilevel Mesh Partitioning.” Parallel Computing 26(12), 1635–1660. Walshaw, C. (2001). “Multilevel Refinement for Combinatorial Optimisation Problems.” Technical Report 01/IM/73, Computing Mathematical Science, University of Greenwich, London.

A Parallel Multilevel Metaheuristic for Graph Partitioning - Springer Link

A Parallel Multilevel Metaheuristic for Graph Partitioning - Springer Link

Suggest Documents

parallel multilevel graph-partitioning software

parallel multilevel graph-partitioning software - Semantic Scholar

Engineering Multilevel Graph Partitioning Algorithms

Multilevel Algorithms for Multi-Constraint Graph Partitioning

Benchmarking for Graph Clustering and Partitioning - Springer Link

A Graph Partitioning Approach to Entity ... - Springer Link

Parallel Graph Partitioning on Multicore Architectures - Computer ...

Parallel Multilevel k-Way Partitioning Scheme for Irregular Graphs

Iterative mesh partitioning optimization for parallel ... - Springer Link

A Parallel Co-evolutionary Metaheuristic

A multilevel parallel algorithm to solve symmetric ... - Springer Link

a multilevel framework - Springer Link

An improved spectral graph partitioning algorithm for mapping parallel ...

Parallel Dynamic Graph Partitioning for Adaptive ... - Semantic Scholar

Parallel Dynamic Graph Partitioning for Adaptive ... - Semantic Scholar

A Distributed Metaheuristic for Solving a Real-World ... - Springer Link

Vertex Separators for Partitioning a Graph - MDPI

A metaheuristic based on fusion and fission for partitioning ... - POM

Metaheuristic-based inspection policy for a one-shot ... - Springer Link

Graph Partitioning and Parallel Solvers: Has the ... - Google Sites

Very Large Graph Partitioning by Means of Parallel DBMS

Multilevel Refinement for Combinatorial Optimisation ... - Springer Link

MESH PARTITIONING: A MULTILEVEL BALANCING AND ...

A Serial Multilevel Hypergraph Partitioning Algorithm