Journal of Heuristics, 10: 315–336, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
A Parallel Multilevel Metaheuristic for Graph Partitioning ˜ R. BANOS C. GIL∗ Departamento de Arquitectura de Computadores y Electronica, Universidad de Almeria, La Ca˜nada de San Urbano s/n, 04120 Almeria, Spain email:
[email protected] email:
[email protected] J. ORTEGA Departamento de Arquitectura y Tecnologia de Computadores, Universidad de Granada, Campus de Fuentenueva, Granada, Spain email:
[email protected] F.G. MONTOYA Departamento de Ingenieria Civil, Universidad de Granada, Campus de Fuentenueva, Granada, Spain email:
[email protected] Submitted in September 2003 and accepted by Enrique Alba in March 2004 after 1 revision
Abstract One significant problem of optimisation which occurs in many scientific areas is that of graph partitioning. Several heuristics, which pertain to high quality partitions, have been put forward. Multilevel schemes can in fact improve the quality of the solutions. However, the size of the graphs is very large in many applications, making it impossible to effectively explore the search space. In these cases, parallel processing becomes a very useful tool overcoming this problem. In this paper, we propose a new parallel algorithm which uses a hybrid heuristic within a multilevel scheme. It is able to obtain very high quality partitions and improvement on those obtained by other algorithms previously put forward. Key Words: graph partitioning, parallel optimisation, multilevel optimisation, metaheuristic, simulated annealing, tabu search
The Graph Partitioning Problem (GPP) occurs in many areas. For example, VLSI design (Banerjee, 1994; Alpert and Kahng, 1995), test pattern generation (Klenke, Williams, and Aylor, 1992; Gil et al., 2002), data-mining (Mobasher et al., 1996), efficient storage of great data bases (SheKhar and DLiu, 1996), geographical information systems (Guo, Trinidad, and Smith, 2000), etc. The critical issue is finding a partition of the vertices of a graph into a given number of roughly equal parts, whilst ensuring that the number of edges connecting the vertices in different sub-graphs is minimised. As the problem is NP-complete (Garey ∗ Author
to whom all correspondence should be addressed.
˜ BANOS ET AL.
316
and Johnson, 1979), efficient procedures providing high quality solutions in a reasonable amount of time are very useful. Different strategies have been proposed to solve the GPP. A classification of graph partitioning algorithms is shown as follows. • Local vs. Global Methods: If the partitioning algorithm uses a previously obtained initial partition, it is called a local method (Kernighan and Lin, 1970). However, if the algorithm also obtain the initial partition, it is called global (Simon, 1991). • Geometric vs. Coordinate-free Methods: If the algorithm takes into account the spatial location of the vertices (Gilbert, Miller, and Teng, 1998), it is called geometric. Otherwise, if only connectivity among vertices is considered, it is called a coordinate-free method (Simon, 1991). • Static vs. Dynamic Partitioning: The static partitioning model divides the graph only once (Simon, 1991). Sometimes, due to the characteristics of the application where GPP occurs, the graph structure changes dynamically, thus making it necessary to repeatedly apply an optimisation. These algorithms (Schloegel, Karypis, and Kumar, 2001) are called dynamic. • Multilevel vs. Non-Multilevel Schemes: If the algorithm directly divides the target graph (Kernighan and Lin, 1970), it is called Non-Multilevel. Otherwise, if the graph is grouped several times, divided in the lowest level and then ungrouped up to the target graph, it is called Multilevel (Karypis and Kumar, 1998a). • Serial vs. Parallel Algorithms: The typical approach to solving the GPP is based on algorithms that run on a single processor (Karypis and Kumar, 1998a). These algorithms are called serial algorithms. In other cases, parallel processing is used either to speed up the serial version or to explore different areas of the search space (Karypis and Kumar, 1998b). In this paper, an efficient parallel multilevel algorithm for GPP is presented. This algorithm uses a multilevel scheme in the search process, which includes a metaheuristic based on mixing Simulated Annealing (SA) and Tabu Search (TS). Further, parallel processing is used to allow a cooperative and simultaneous search of the solution space. This results in a global free-coordinate parallel multilevel algorithm for static graph partitioning, whose partitions often improve those previously obtained by other algorithms. Section 1 provides a more precise definition of the GPP and describes the cost function used in the optimisation process. Section 2 describes the proposed metaheuristic to solve the problem. Section 3 provides the design of the multilevel scheme, including the metaheuristic as described in Section 2. Section 4 offers a detailed explanation of the parallelisation of the multilevel algorithm, while the analysis of the results obtained are provided in Section 5. Finally, Section 6 gives the conclusions and suggests areas for future work. 1.
The graph partitioning problem (GPP)
Given a graph G = (V, E), where V is the set of vertices, with |V | = n, and E the set of edges which determines the connectivity among the vertices, the GPP consists of dividing
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
Figure 1.
317
Two possible alternatives to divide a graph.
V into K balanced sub-graphs, V1 , V2 , . . . , VK , so that Vi ∩ V j = φ for all i = j; and K k=1 |Vk | = |V |. The balance condition is defined as the maximum sub-domain weight, S = max(|Vk |), for k = 1, . . . , K , divided by the perfect balance, n/K . If a defined imbalance, x%, is allowed, then the GPP try to find a partition such that the number of cuts is minimised subject to the constraint that S ≤ n/K ∗ ((100 + x)/100). Whenever the vertices and edges have weights, |v| denotes the weight of vertex v, and |e| denotes the weight of edge e. All the test graphs used to evaluate the quality of our algorithm have vertices and edges with weight equal to one (|v| = 1 and |e| = 1). However, our procedure is able to process graphs with any weight values. Figure 1 shows two possible partitions of a given graph. In figure 1(a) the graph is divided into two equally balanced sub-graphs, where the number of cuts is not minimised. On the other hand, figure 1(b) shows a partition with the optimal number of cuts. However, this partition does not fulfil the requirement of load balancing. This example clearly shows the conflict of objectives. In order to choose one or other, the following cost function can be considered: c(s) = α·ncuts(s) + β·
K
2imbalance(k) (s)
(1)
k=1
2.
The metaheuristic: Refined mixed simulated annealing and Tabu search (rMSATS)
In this section, we provide a description of the hybrid heuristic proposed in Gil, Ortega, and Montoya (2000). Next, we detail the two phases of rMSATS. 2.1.
Obtaining the initial partition of the graph
The first step to solve the GPP is to make an initial partitioning of the target graph. In this step rMSATS uses a procedure known as Graph Growing Algorithm (GGA) (Karypis
˜ BANOS ET AL.
318
and Kumar, 1998a). This algorithm starts from a randomly selected vertex, which is then assigned to the first sub-graph, as their adjacent vertices. This recursive process is repeated until this sub-graph reaches n/K vertices. From this point, the following visited vertices are assigned to a new sub-graph, and the process is repeated until all the vertices are assigned to their respective sub-graph. As the position of the initial vertex determines the structure of the primary partition, its random selection offers a very useful diversity. 2.2.
Optimisation by mixing simulated annealing and Tabu search
The application of an algorithm, such as GGA above, is insufficient to obtain a partition with adequate quality. Therefore, a refinement phase is required to effectively explore the search space. The problem with local search and hill climbing techniques is that the search may stop at local optima. In order to overcome this drawback, rMSATS uses Simulated Annealing (SA) (Kirkpatrick, Gelatt, and Vecchi, 1983) and Tabu Search (TS) (Glover and Laguna, 1993). The continued use of both heuristics results in a hybrid strategy and allows the search process to escape from local minima, while simultaneously preventing the occurrence of cycles. The concept of SA uses a variable called temperature, t, whose value diminishes in the successive iterations based on a factor termed Tfactor. The variable t is included within the Metropolis function and acts, simultaneously, as a control variable for the number of iterations of the algorithm, and as a probability factor for a definite solution to be accepted. The decrease of t implies a reduction in the probability of accepting movements which worsen the cost function. On the other hand, when the search space is explored, the use of a limited neighbourhood can cause the appearance of cycles. To avoid this problem, TS complements SA. Thus, when an adverse movement is accepted by the Metropolis function, the vertex is moved and included in the tabu list. In the current iteration, the set of vertices included in the tabu list cannot be moved. However, they are removed from the list at the end of the next one. Experimental outcomes (Gil, Ortega, and Montoya, 2000) indicate that the use of both techniques improves the results obtained when only SA or TS are applied individually. These experiments also obtain good results in comparison with other multilevel algorithms. In many cases, results obtained by rMSATS outperform the ones obtained by the METIS library (Karypis and Kumar, 1998a). Algorithm 1 shows the pseudo-code of the rMSATS procedure. Input parameters are required in the first step. Initial partition is obtained in the second step by using the GGA algorithm. The loop defined in Step 3 continues until the end condition is false, i.e., the number of iterations is smaller to R and t is larger to the established threshold. In each iteration of this loop, the boundary vertices are evaluated (Step 3.a). For each boundary vertex, rMSATS evaluates the cost of the movement of this vertex to the neighbouring sub-graph, as a function of the cost (Step 3.a.1), and the value of t. This movement is either accepted or rejected (Step 3.a.2). If the movement is accepted, then rMSATS test two things. Firstly, if the last movement implies a worsening of the cost function, then the vertex is added to the tabu list (Step 3.a.2.a). Further, after each movement is analysed, rMSATS verifies that the new solution is better than the previous one. In this case (Step 3.a.2.b), the solution is saved. Before starting with the next iteration, the temperature and iteration counter are updated (Step 3.a.3). Finally, in Step 4 the best solution found is returned.
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
319
Algorithm 1: refined Mixed Simulated Annealing and Tabu Search (rMSATS). 1) Input: graph, K, max imb, Ti, Tfactor, R; 2) Obtain initial partition of graph by applying GGA; t = Ti; 3) While (current iteration ≤ R) OR (t ≥ 0.1) 3.a) For each (boundary vertex v) do 3.a.1) cost = cost of the movement of v to the neighbour subgraph; 3.a.2) If Metropolis (cost, t) is accepted then perform movement (v, boundary subgraph); 3.a.2.a) If cost movement ≥ 0 then add v to the tabu list; 3.a.2.b) If current solution ≤ best solution then best solution = current solution; 3.a.3) t = t∗ Tfactor; current iteration = current iteration + 1; 4) Return the best solution;
3.
rMSATS within a multilevel scheme: MultiLevel rMSATS (MLrMSATS)
The Multilevel Paradigm has become an effective strategy for GPP. Currently, most of the algorithms for graph partitioning use a multilevel paradigm in combination with local search procedures, usually variants of the Kernighan and Lin algorithm (KL) (Kernighan and Lin, 1970). These algorithms (Karypis and Kumar, 1998a; Walshaw and Cross, 2000b) define the current state of the art trend. The Multilevel paradigm consists of three different phases. Figure 2 provides a visual description of this strategy. In the first phase, the vertices of the target graph are used to build clusters that will become the vertices of a new graph. By continuously repeating this process, graphs with fewer vertices will be created, until a sufficiently small graph is obtained. The second phase consist of making a first partition of that graph by using any of the existing procedures. However, the quality of this first partition is low, since the coarsening phase intrinsically implies a loss of accuracy. This is the reason to perform a third phase in which the graph is projected towards its original configuration, by applying simultaneously a refinement algorithm. In the following, the main characteristics of the multilevel version of rMSATS, MLrMSATS (Ba˜nos et al., 2003), are detailed. 3.1.
Coarsening phase
MLrMSATS uses a deterministic matching strategy to coarsen the graph. This HEavy Matching (HEM) strategy consists of matching the vertices according to the weight of the edges that connects them. The vertices not matched in this level are taken and matched with others not yet visited whose common edge has the highest weight. The advantage of this alternative is that the heaviest edges are hidden during this phase, and the resulting graph is, therefore built by light edges. Thus, the cuts obtained after the graph partitioning will
˜ BANOS ET AL.
320
Figure 2.
Multilevel paradigm.
tend to decrease. The coarsening process finishes when the number of vertices becomes less than the given threshold of z · K , where z is a parameter with a properly selected value (z = 15 in our experiments). Some studies (Karypis and Kumar, 1998a) have determined that HEM strategy obtains better solutions than others like Random Matching (RM). A problem, independent of the matching strategy used to coarse the graph, occurs when a vertex does not have any free neighbour with which it can be matched. These vertices will pass directly to the next level. This creates an imbalance in the weight of the vertices of the new graph. As the union of vertices is made by matching two vertices, this problem will get worse in the coarser levels. Thus, the weights of the vertices in the level i, G i , take values in the interval [1, 2i]. If all the vertices of the initial graph have weights equal to one, we will have a graph that is grouped in 10 levels. In the inferior level, G 10 , it is possible to have a vertex u with weight |u| = 1, and another v with weight |v| = 210 . This makes it difficult to have balanced sub-graphs, mainly in the coarsest levels, since the movement of very heavy vertices between neighbouring partitions cannot be accepted by the objective function. In order to solve this problem, MLrMSATS selects the vertices to be matched in ascending order of weights. The algorithm first tries to match vertices with lower weights, i.e., those that have been isolated in previous levels. 3.2.
Initial partitioning phase
After the coarsening phase, GGA is applied to obtain the first partition in the coarsest graph. GGA gives a total balanced initial partition, but the efficiency of the solution with reference
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
321
to the number of cuts is reduced. Thus, it is necessary to apply a final phase to improve the quality of the partition. 3.3.
Un-coarsening phase
It is necessary to apply a given technique to improve the quality of the initial partition. As previously stated, most algorithms use local search methods, usually variations of the Kernighan and Lin (1970) procedure. However, to improve the initial solution, MLrMSATS applies rMSATS in all the levels. In each one, the solution obtained in the previous level is optimised by using rMSATS with the initial annealing values. The final solution is obtained by repeating this process until the highest-level graph is reached. Algorithm 2: MultiLevel rMSATS (MLrMSATS). 1) Input: graph, K , max imb, Ti, Tfactor, R, z; 2) Coarse graph N levels by using HEM strategy in function of z and K ; 3) Obtain initial partition of graph by applying GGA; 4) For i = N to 1 repeat 4.a) Re-initialise parameters using input values; 4.b) While (current iteration ≤ R) do 4.b.1) Apply rMSATS(current iteration,max imb,t); 5) Return the best solution; Algorithm 2 formally defines MLrMSATS behaviour. Step 2 corresponds to the coarsening phase, where HEM is applied. In Step 3, GGA is used to obtain the initial partition, while the refinement process is performed in Step 4 using rMSATS heuristic. Results obtained by MLrMSATS (Ba˜nos et al., 2003) in most of the cases offer an improvement to rMSATS, and also to METIS library algorithms (Karypis and Kumar, 1998a). 4.
Parallel multilevel metaheuristic: Parallel multilevel simulated annealing and Tabu search (PMSATS)
In the previous section, the main characteristics of MLrMSATS have been described. This multilevel algorithm has been parallelised in order to explore the search space using several initial partitions and annealing parameters. Due to graph partitioning being a NP-complete problem and the large size of the graphs used in realistic situations, parallel processing is a very useful technique. Several metaheuristics, necessary to resolve the combinatorial optimisation problems, also need to be parallelised (Cung et al., 2001). Some of these parallel metaheuristics have been successfully applied to the GPP (Diekmann et al., 1996; Randall, 1999). On the other hand, some multilevel approaches have also been used in parallel (Karypis and Kumar, 1996; Walshaw and Cross, 2000b) and obtain very high quality partitions in a reasonable amount of time.
322
˜ BANOS ET AL.
The new algorithm, which we propose in this paper, is a parallelisation of the multilevel metaheuristic algorithm described in the previous section. Our parallelisation is not aimed to reduce the runtime of the serial version (MLrMSATS), but rather to optimise as much as possible, the quality of the partitions by taking advantage of the characteristics of both, the multilevel paradigm and the metaheuristic rMSATS. The idea consists of building a set of p solutions. Each solution applies MLrMSATS in a different way. The coarsening phase is performed using the HEM strategy. Then, each solution applies GGA, starting at a randomly chosen vertex. Thus, each solution starts the refinement phase with its own initial partition. In the refinement phase, all the p solutions uses its own annealing values, as we will describe later. In some iterations of the refinement phase, an elitist selection mechanism is applied in order to continue applying the search with the best solutions of the set and discarding the worse ones. Figure 3 provides a graphical comparison amongst rMSATS, MLrMSATS and PMSATS. Figure 3(a) corresponds to rMSATS, where only a solution is optimised, after firstly obtaining the initial partition by using GGA, and then, by using rMSATS in the optimisation phase. Figure 3(b) corresponds to MLrMSATS, where again a solution is optimised, but this time using the multilevel paradigm. Finally, figure 3(c) corresponds to PMSATS, where a population of p solutions is optimised simultaneously using parallel processing. Each solution applies HEM in the same way. In each solution, the initial partitioning is performed using GGA starting from a different initial vertex, thus obtaining different initial partitions before performing the refinement phase. Finally, the refinement phase is independently carried out for each solution by applying rMSATS with their own annealing values (different from the others), although in same iterations there are interactions between the solutions thus enabling an elitist selection. Then, PMSATS continues improving the quality of the best solutions and discarding the worst ones. The main characteristics of PMSATS are detailed as follows: • Each solution applies GGA to the coarsest graph starting from a vertex chosen by random interval selection, as we detail: Let p be the number of solutions, then solution Pi , i = 1 . . . p, makes the first partition starting from a vertex randomly chosen in the interval [( i−1 · n) + 1, ( pi · n) + 1[. This strategy assures diversity in the initial partitions, even in p the case of irregular graphs. • In order to accommodate the effect of modifying the annealing parameters, each solution uses a different initial temperature as the identifier, i, as we explain later. An interval of initial temperatures [Ti Min, Ti Max], and a fixed number of iterations, R, are established. The solution P1 starts in Ti Min, the solution Pp starts in Ti Max, and the others are equally distributed along this interval. Then, Tfactor is computed for each solution in function of R and its initial temperature. In figure 4, a clear example of this strategy is shown. Here, an interval of initial temperatures, Ti = [150, 50], and a fixed number of iterations, R = 1000, has been established. The solution A has the highest value of Ti , and Tfactor is low, thus determining a fast decrease in the temperature. On the other hand, solution J has the lowest value of Ti , and Tfactor is high, determining a slow decrease in the temperature. With this strategy, our algorithm also provides a fair distribution of the work load thus avoiding the loss of efficiency that would occur if each process randomly chose the value of Ti and Tfactor.
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
Figure 3.
323
Graphical description of rMSATS (a), MLrMSATS (b), and PMSATS (c).
• PMSATS uses an elitist selection mechanism to best utilise the computational resources in the optimisation of the best solutions obtained. In some iterations at the refinement phase, the quality of the solutions is competitively compared. In a given iteration, if w solutions (w ≤ p) are evaluated, the winner of the tournament sends its current solution to the others, which continue to apply the refinement process using this new solution, and together with their own annealing values. By using this elitist strategy, the algorithm continues working on the best solutions with different annealing parameters, instead of exploring other less efficient solutions. To pursue this idea, we need to resolve two different questions regard- ing the method of selection required to do it. The first question requires us to determine the best way to perform the migration amongst processors.
˜ BANOS ET AL.
324
Figure 4.
Parallel temperature variation using different values of Ti and Tfactor.
Figure 5.
Migration strategies used by PMSATS: (a) STR1 and (b) STR2.
We have used two different strategies, with this purpose in mind. The first one, STR1, (figure 5(a)) is based on specific communication between the solution Pi and its close neighbours, i.e., with Pi−1 and Pi+1 alternatively. The second strategy, called STR2, (figure 5(b)) is based on broadcasting the best solution of the set to the rest of the processors, and continuing the search with this new solution. The second question requires us to determine the optimum migration frequency. Two different alternatives have been implemented, taking into account the characteristics of the heuristic within the multilevel scheme. The first one, M1, consists of migrating the solutions after each refinement step. After applying rMSATS in the current level and before it is projected to the upper
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
325
one, the solutions are selected and migrated by using one of the proposed strategies. The second alternative, M2, allows communication to be held only in the highest level, described as follows: Let C be the number of communications and R the number of refinement iterations. Then, the communications are performed in the target graph only at the iterations { CR , 2· CR , . . . , C· CR }. This strategy makes an independent refinement of the solutions at all levels possible, except in the final level, where the solutions are better. In order to compare this strategy with M1, the value of C is set equal to the number of refinement levels. Thus, both alternatives have the same amount of communication.
Algorithm 3 presents the PMSATS procedure. In the first step, the input parameters are acquired. Step 2 consists of coarsening the target graph N levels by using the HEM strategy. N depends on z and K as previously stated. Step 3 calculates the annealing parameters for use in the optimisation process. Step 4 obtains an initial solution (initial partition) of the graph by applying GGA using random interval selection in function of the solution’s identifier. Step 5 is repeated in each one of the un-coarsening iterations until the target graph (graph of level 1) is finally optimised. In order to apply the new optimisation loop in the current level, Step 5.a initialises the annealing parameters with the calculus of Step 3. Step 5.b controls the number of iterations of rMSATS refinement process. In each cycle of the loop, the rMSATS algorithm is applied with its own parameters and variables. The elitist selection is performed using the input migration strategy whereby the selected migration frequency is M2, the current level is 1 (finest level), and the current iteration is one of the iterations where communication has been established. Once Step 5.b. has finished, and only if the selected migration frequency is M1, Step 5.c allows the communication amongst the processors. Then, the graph is ungrouped (Step 5); unless the last optimisation had been done over the graph of level 1, i.e. the target graph. In this case, all the solutions are sent to the master processor (Step 6), which returns the best of all the received solutions (Step 7). Algorithm 3: Parallel Multilevel Simulated Annealing and Tabu Search (PMSATS). 1) Input: graph, K, max imb, z, p, Ti min, Ti max, R, mig strat, mig freq; 2) Coarse graph N levels by using HEM in function of z and K ; 3) Determine Ti and Tfactor for this solution; t = Ti; 4) Obtain initial partition of graph by applying GGA; 5) For i = N to 1 repeat 5.a) Re-initialise parameters using input values; 5.b) While (current iteration ≤ R) do 5.b.1) Apply rMSATS(current iteration, max imb, t); 5.b.2) If mig freq == M2 and current level == 1 and current iteration is a communication iteration then Elitist Tournament Selection(mig strat); 5.c) If mig freq == M1 then Elitist Tournament Selection(mig strat); 6) Send best solution to the master; 7) Return best solution received of all the slaves;
˜ BANOS ET AL.
326 5.
Experimental results
The PMSATS executions were performed by using a cluster of twelve dual Intel Xeon 2.4 GHz processors. The test graphs used have different sizes and topologies. These graphs belong to a public domain set frequently used to compare and evaluate graph-partitioning algorithms. Table 1 briefly describes them: number of vertices, number of edges, maximum connectivity (max) (number of neighbours to the vertex with the highest neighbourhood), minimum connectivity (min), average connectivity (avg) and file size. Table 1.
Set of test graphs used to evaluate the experimental results.
Graph
|V|
add20
2395
data 3elt
|E|
min
max
avg
7462
1
123
6.23
63
2851
15093
3
17
10.59
140
4720
13722
3
9
5.81
136
uk
4824
6837
1
3
2.83
70
add32
4960
9462
1
31
3.82
90
whitaker3
File size (KB)
9800
28989
3
8
5.92
294
crack
10240
30380
3
9
5.93
297
wing nodal
10937
75488
5
28
13.80
768
fe 4elt2
11143
32818
3
12
5.89
341
vibrobox
12328
165250
8
120
26.81
1679
bcsstk29
13992
302748
4
70
43.27
1679
4elt
15606
45878
3
10
5.88
501
fe sphere
16386
49152
4
6
6.00
540
cti
16840
48232
3
6
5.73
532
memplus
17758
54196
1
573
6.10
536
cs4
22499
43858
2
4
3.90
506
bcsstk30
28924
1007284
3
218
69.65
11403
bcsstk31
35588
572914
1
188
32.2
6547
bcsstk32
44609
985046
1
215
44.16
11368
t60k
60005
89440
2
3
2.98
1100
wing
62032
121544
2
4
3.92
1482
brack2
62631
366559
3
32
11.71
4358
finan512
74752
261120
2
54
6.99
3128
fe tooth
78136
452591
3
39
11.58
5413
99617
662431
5
125
13.3
7894
110971
741934
5
26
13.37
9030
fe rotor 598a fe ocean
143437
409593
1
6
5.71
5242
wave
156317
1059331
3
44
13.55
13479
m14b
214765
1679018
4
40
15.64
21996
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
327
These test graphs, together with the best known solutions for them, can be found in Walshaw’s Graph Partitioning Archive (Graph Partitioning Archive (2003)). These solutions indicate the number of cuts classified by levels of imbalance (0%, 1%, 3% and 5%). Thus, the reduction in the number of cuts is considered as an objective, while the imbalance degree (less than 5% in our experiments) is considered as a restriction. Under these conditions, the cost function described in (1), has parameters α = 1, and β = 0; with imbalance(k) ≤ 5 for all k in the interval [1, K ].
5.1.
Parameter setting
Figure 6 shows the results obtained by PMSATS by applying GGA, using 1, 5 and 20 different solutions, with the random interval selection for the initial vertex previously explained. All the solutions use the same temperature values, Ti = 100 and Tfactor = 0.995. With these parameters, the number of iterations of the algorithm is set to 1500. As we can see, the use of different initial partitions offers a diversity, which helps to improve the solution by using fewer partitions. Once it has been shown that, by using more solutions and by applying GGA from different initial vertices, the quality of the solutions often improves. We can analyse the performance of the algorithm when the number of solutions and iterations has been modified. Figure 7 compares the application of PMSATS with p = 10, R = 3000; p = 20, R = 1500; and p = 40, R = 750. Here, the annealing values are T i = 100 and Tfactor = 0.995. As it can be seen in this figure, the best configuration corresponds to p = 20, R = 1500. The advantage of using more solutions comes from the improvement in the diversity of the
Figure 6.
Effect of applying PMSATS with different number of initial partitions.
˜ BANOS ET AL.
328
Figure 7.
Effect of modifying the number of solutions and iterations.
searching process. However, the high complexity of the search requires the application of rMSATS during many iterations. Thus, the selected parameters are p = 20, and R = 1500. The determination of the interval of initial temperatures in the annealing process poses a problem. Neither the adequate size of the range nor its extreme values are known. The number of solutions ( p) is also an important factor, because whenever the value of p increases it is supposed that the range should also be increased. Furthermore, the irregularity of the test graphs increases the dificulty of selecting adequate values for these parameters. For these reasons, the selection of an optimal interval becomes another NP-complete problem. To resolve this issue, we have applied the algorithm using a recursive division strategy. The idea consists of selecting a very large initial range, for example Ti∗ = [500, 2]. This interval is recursively divided until none of the sub-intervals of a certain level improve the solutions of one of the greater sub-intervals. This interval is then selected as the adequate interval. Figure 8 shows the average number of cuts obtained by each interval after applying the algorithm over the test graphs for some values of K (K = {4, 16, 64}). For a number of p = 20 solutions, none of the average number of cuts for the four smallest sub-intervals is lower than the one obtained in the interval Ti∗ = [500, 250], which obtains the best average result. Therefore, we selected this interval for next experiments. Besides the improvement obtained by using different initial vertices in GGA and different annealing parameters, we have found an adequate range of values for Ti. Next, we determine the performance of the algorithm with respect to communication by using the migration strategies described in Section 4. In figure 9 the results obtained for these migration strategies are shown and the variation of the migration frequencies, is also analysed. In these executions, a population of p = 20 solutions and R = 1500 iterations has been considered, using the range of initial temperatures previously selected, Ti∗ = [500, 250].
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
Figure 8.
Performance of PMSATS considering different temperature ranges.
Figure 9.
Performance of PMSATS by using different migration strategies and migration frequencies.
329
The results obtained indicate that, in most cases, STR1 improves the average number of cuts with respect to the STR2. The reason for this behaviour is that by sensible broadcasting breaks, the independence of the search done by each solution is improved. With respect to the migration frequency, results indicate that it is better to use communication only in the finest level rather than performing migration at each change of level. This behaviour is
˜ BANOS ET AL.
330
similar to the effect of broadcasting migration, i.e., the migration in each change of level breaks the independence of the searching process. 5.2.
PMSATS versus the previous approaches
Using Ti = 100 and Tfactor = 0.995 as annealing parameters, PMSATS executions obtain very high quality partitions in comparison with rMSATS and MLrMSATS. For the subset of test graphs used in the previous comparison, PMSATS obtains better solutions in all of executions than rMSATS (Gil et al., 2002). PMSATS is also more efficient than MLrMSATS (Ba˜nos et al., 2003). In 90% of executions PMSATS, improves the results of MLrMSATS, while the same partition is obtained in 6% of them. 5.3.
PMSATS versus other public domain packages
In this section, we compare PMSATS against other formerly proposed algorithms. As we commented previously, PMSATS uses parallel processing to perform the multilevel algorithm (MLrMSATS). We therefore compared PMSATS with two versions of JOSTLE, a high quality multilevel graph-partitioning algorithm. Further, we compare PMSATS against ParMETIS, a powerful parallel graph partitioning library. With reference to JOSTLE, we compare PMSATS against two versions of JOSTLE, JOSTLE Evolutionary, and iterated JOSTLE. JOSTLE models use a multilevel paradigm approach. The principal characteristic of JOSTLE is that it applies a variant of the KL method (Kernighan and Lin, 1970) procedure during the refinement phase, by using two different bucket sorting structures to perform the movements of boundary vertices in function of their gains. The basic idea of JOSTLE Evolutionary (JE) (Soper, 2000) is that each vertex is assigned a bias greater than or equal to zero, depending of its position in reference to the boundary, and each edge a dependent weight, being a sum of its end vertices. With these values, in the coarsening phase of JOSTLE, the edges with the highest weights are matched first (as HEM works) and when performing the refinement phase, vertex gains are calculated using the biased edge weights. The effect is that vertices with small bias are more likely to appear at the boundary of a sub-domain rather than those with a large bias, and edges with lower weights are more likely to become cut edges rather than those with higher weight. The evolutionary scheme is based on obtaining successive offspring, from the evolutionary search, whose crossover and mutation operators are dependent on the biases of the individuals of each generation. On the other hand, we have also compared PMSATS versus iterated JOSTLE (iJ) (Walshaw, 2001). iJ is based on the alternative application of the multilevel algorithm JOSTLE. In each iteration of iJ, the multilevel process is performed using information from the previous iteration, improving the previous solution. The iterative process finishes when after a given number of iterations there is no further improvement to the quality of the solution. Finally, we also compare the results of PMSATS versus ParMETIS. ParMETIS (ParMETIS, 2003) is an MPI-based parallel library which implements a variety of algorithms for the partitioning of unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS (Karypis
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
331
and Kumar, 1998a) and includes routines which are especially suited for parallel AMR computations and large scale numerical simulations. The set of algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph partitioning algorithms described in Karypis and Kumar (1996), the adaptive repartitioning algorithm described in Schloegel, Karypis, and Kumar (2000b), and the parallel multi-constrained algorithms described in Schloegel, Karypis, and Kumar (2000a). Thus, we compare PMSATS against the parallel multilevel k-way graph partitioning algorithm provided by ParMETIS (ParMETIS v.3.1.0). Tables 2 and 3 show the best results obtained by PMSATS versus ParMETIS library (ParMETIS 2003), and also with the best known solutions obtained by other algorithms (Graph Partitioning Archive (2003)), over all the test graphs included in Table 1, and with an imbalance of less than 5%. In comparison with ParMETIS, PMSATS obtains better results in 95% of cases, whilst ParMETIS only proves to be better in 1% of executions. In the other 4% of the cases, PMSATS and ParMETIS obtain the same partition. On the other hand, PMSATS obtains better results than the previously best known solutions in 40% of executions, and equals the best known solutions in 12% of cases. Most of these best known solutions included in the Graph Partitioning Archive have been obtained by JE and iJ. The run-times of PMSATS executions can vary from few seconds to several hours, dependent on the graph. In comparison with ParMETIS, the run-times of PMSATS are larger by approximately two orders of magnitude. Nevertheless, run-times of JE and iJ are larger to PMSATS (e.g. run-times of JE for large graphs are of several days). Figure 10 shows the number of the best known solutions obtained by each one of the algorithms included in Graph Partitioning Archive (2003), with imbalance of less than 5%, over all the test graphs described in Table 1 for K = {2, 4, 8, 16, 32, 64}. If two or more different algorithms obtain the same number of cuts for a certain graph and the value of
Figure 10. Archive.
Number of best known solutions found by each algorithm included in Walshaw’s Graph Partitioning
˜ BANOS ET AL.
332
Table 2.
Comparison of PMSATS, ParMETIS and best know solutions with imbalance less than 5%. (K = 4) cuts
(K = 8) cuts
(K = 32) cuts
(K = 64) cuts
638 778 618
1184 1327 1184
1709 2053 1705
2107 3060 2186
2583 3790 2785
3182 3927 3266
PMSATS ParMETIS Graph Part. Archive
189 225 196
391 468 378
681 871 702
1147 1441 1195
1815 4415 1922
2803 7668 2911
3elt
PMSATS ParMETIS Graph Part. Archive
87 108 87
199 266 199
336 448 334
567 884 566
956 3328 958
1535 4178 1552
uk
PMSATS ParMETIS Graph Part. Archive
21 24 18
47 75 41
93 147 82
166 334 154
288 527 265
466 795 436
add32
PMSATS ParMETIS Graph Part. Archive
10 10 10
36 33 33
67 309 69
150 462 117
246 677 212
597 1130 624
whitaker3
PMSATS ParMETIS Graph Part. Archive
126 132 126
381 489 380
658 862 658
1100 1404 1092
1698 5425 1686
2544 6110 2535
crack
PMSATS ParMETIS Graph Part. Archive
182 209 183
361 412 360
676 845 676
1083 1296 1082
1690 1988 1679
2540 2883 2590
wing nodal
PMSATS ParMETIS Graph Part. Archive
1669 1908 1970
3564 3887 3566
5378 5929 5387
8332 9164 8316
11814 12787 12024
15789 17374 16102
fe 4elt2
PMSATS ParMETIS Graph Part. Archive
130 130 130
349 392 349
601 710 597
1012 1143 1007
1641 1831 1651
2520 2796 2516
vibrobox
PMSATS ParMETIS Graph Part. Archive
10630 12802 10310
20050 21239 19245
24338 28701 24158
33460 37882 31695
41356 45311 41176
47149 51552 50757
bcsstk29
PMSATS ParMETIS Graph Part. Archive
2818 2958 2818
8388 9617 8088
15047 18840 15314
23235 27456 24706
34843 41680 36731
56120 60938 58108
4elt
PMSATS ParMETIS Graph Part. Archive
137 163 137
322 387 319
532 652 527
939 1103 916
1554 1835 1537
2583 2938 2581
fe sphere
PMSATS ParMETIS Graph Part. Archive
384 458 384
776 906 766
1193 1395 1152
1719 2081 1692
2575 2966 2477
3623 4142 3547
cti
PMSATS ParMETIS Graph Part. Archive
318 459 318
889 1104 917
1727 2212 1716
2781 3452 2778
4034 4963 4236
5738 6753 5907
Graph
Algorithm
add20
PMSATS ParMETIS Graph Part. Archive
data
(K = 2) cuts
(K = 16) cuts
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING Table 3.
333
Comparison of PMSATS, ParMETIS and best know solutions with imbalance less than 5% (cont.). (K = 2) cuts
(K = 4) cuts
(K = 8) cuts
(K = 16) cuts
(K = 32) cuts
PMSATS ParMETIS Graph Part. Archive
5333 6143 5353
9393 10511 9427
11883 12703 11939
13939 15348 13279
15380 19636 14384
16761 21338 17409
cs4
PMSATS ParMETIS Graph Part. Archive
361 435 363
979 1207 936
1535 1877 1472
2236 2704 2126
3210 3654 3080
4317 4915 4196
bcsstk30
PMSATS ParMETIS Graph Part. Archive
6251 6676 6251
16602 17376 16617
35626 39360 34559
80309 79481 70768
119945 126567 117232
176287 186779 177379
bcsstk31
PMSATS ParMETIS Graph Part. Archive
2676 2749 2676
8177 8139 7879
14791 15113 13561
26196 26086 24179
41106 42827 38572
59155 63383 60446
bcsstk32
PMSATS ParMETIS Graph Part. Archive
5049 6105 4667
12417 12171 9728
23456 26564 21307
40627 43157 38320
67475 71287 62961
101501 102281 96168
t60k
PMSATS ParMETIS Graph Part. Archive
69 96 72
211 247 211
483 545 467
889 1021 852
1473 1663 1420
2322 2507 2221
wing
PMSATS ParMETIS Graph Part. Archive
787 995 778
1703 1989 1636
2664 3169 2551
4170 4975 4015
5980 7113 6010
8328 9536 8161
brack2
PMSATS ParMETIS Graph Part. Archive
660 783 668
2749 3284 2808
7156 7988 7080
11858 13440 11958
18005 20717 17954
25929 29677 26944
finan512
PMSATS ParMETIS Graph Part. Archive
162 162 162
324 324 324
648 648 648
1620 1296 1296
2592 2592 2592
17681 11956 10821
fe tooth
PMSATS ParMETIS Graph Part. Archive
3839 4416 3982
6942 8383 7152
11568 13566 12646
17771 20510 18435
25528 28497 26016
34795 39591 36030
fe rotor
PMSATS ParMETIS Graph Part. Archive
1956 2238 1974
7757 8242 8097
13651 14838 13184
20674 23548 20773
32616 36769 33686
46366 52236 47852
598a
PMSATS ParMETIS Graph Part. Archive
2336 2555 2339
8024 8646 7978
15685 17441 16031
25775 28564 26257
39098 44099 40179
56883 63516 58307
fe ocean
PMSATS ParMETIS Graph Part. Archive
312 557 311
1805 2504 1704
4548 6722 4019
8060 12371 7838
13007 19091 12746
20709 27264 21784
wave
PMSATS ParMETIS Graph Part. Archive
8610 9847 8868
16681 19028 18058
29292 34945 30583
43029 48716 44625
62585 69916 63725
84419 94018 88383
m14b
PMSATS ParMETIS Graph Part. Archive
3842 4219 3866
13401 14607 14013
27468 28689 27711
43501 49184 44174
66942 74007 68468
97143 108724 101385
Graph
Algorithm
memplus
(K = 64) cuts
˜ BANOS ET AL.
334
K , then the winner is the algorithm that hast the most balanced partition. If two or more algorithms obtain the same number of cuts, and the imbalance is also the same, then the first algorithm to find the solution is considered the winner. As we can see in the figure 10, PMSATS is clearly the best algorithm when the number of optimal solutions is considered. In some cases, PMSATS also provides the same best-known partition. JE and iJ also obtain an important number of recognised solutions. The rest of the algorithms only obtain better results in a smaller number of cases. 6.
Conclusions
In this paper, we present a new parallel multilevel metaheuristic algorithm for static graph partitioning. This parallel algorithm uses a multilevel scheme that includes a hybrid metaheuristic which mixes Simulated Annealing and Tabu Search along the search process. The inclusion of this hybrid metaheuristic within the multilevel scheme, in many cases, outperforms other multilevel approaches based on the use of the KL algorithm or other variants. The parallel implementation is focused upon improving the quality of the partitions as much as possible. For this purpose, the parallel algorithm simultaneously optimises several solutions. Each solution evolves independently applying the multilevel paradigm with different annealing parameters. Eventually, during the refinement phase, an elitist selection mechanism is used in order to best utilise the computational resources in the search for the best solutions. The first conclusion derived from the results is that the diversity of the initial partitions is essential in the search process. The selection of the best parameters for the hybrid heuristic is made dificult by the characteristics of the problem, and requires different values in parallel. The consequences of modifying the number of solutions and the number of iterations in the refinement phase of the multilevel algorithm have also been analysed. Furthermore, several migration strategies have been considered, using different migration frequencies. As a result of this analysis, we have designed a robust parallel multilevel metaheuristic algorithm for graph partitioning whose solutions, in most cases, improve or is equal to, those obtained by other previously proposed efficient algorithms. Acknowledgments The authors would like to thank anonymous referees for their helpful comments. This work was supported by project TIC2002-00228 (CICYT, Spain). References Alpert, C.J. and A. Kahng. (1995). “Recent Developments in Netlist Partitioning: A Survey.” Integration: the VLSI Journal 19(1/2), 1–81. Banerjee, P. (1994). Parallel Algorithms for VLSI Computer Aided Design. Prentice Hall: Englewood Cliffs, New Jersey. Ba˜nos, R., C. Gil, J. Ortega, and F.G. Montoya. (2003). “Multilevel Heuristic Algorithm for Graph Partitioning.” In Proceedings Third European Workshop on Evolutionary Computation in Combinatorial Optimization. SpringerVerlag, LNCS 2611, pp. 143–153.
A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING
335
Cung, V.D., S.L. Martins, C.C. Ribeiro, and C. Roucairol. (2001). “Strategies for the Parallel Implementation of Metaheuristics.” In C.C. Ribeiro and P. Hansen (eds.), Essays and Surveys in Metaheuristics. Kluwer, pp. 263– 308. Diekmann, R., R. Luling, B. Monien, and C. Spraner. (1996). “Combining Helpful Sets and Parallel Simulated Annealing for the Graph-Partitioning Problem.” Parallel Algorithms and Applications 8, 61–84. Garey, M.R. and D.S. Johnson. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: W.H. Freeman & Company. Gil, C., J. Ortega, and M.G. Montoya. (2000). “Parallel VLSI Test in a Shared Memory Multiprocessors.” Concurrency: Practice and Experience 12(5), 311–326. Gil, C., J. Ortega, M.G. Montoya, and R. Ba˜nos. (2002). “A Mixed Heuristic for Circuit Partitioning.” Computational Optimization and Applications Journal 23(3), 321–340. Gilbert, J., G. Miller, and S. Teng. (1998). “Geometric Mesh Partitioning: Implementation and Experiments.” SIAM Journal on Scientific Computing 19(6), 2091–2110. Glover, F. and M. Laguna. (1993). “Tabu Search.” In C.R. Reeves (ed.), Modern Heuristic Techniques for Combinatorial Problems. London: Blackwell, pp. 70–150. Graph Partitioning Archive. (2003). http://www.gre.ac.uk/∼c.walshaw/partition/. URL time: August 31th, 2003, 2345 . Guo, J., G. Trinidad, and N. Smith. (2000). “MOZART: A Multi-Objective Zoning and AggRegation Tool.” In Proceedings 1st Philippine Computing Science Congress. pp. 197–201. Karypis, G. and V. Kumar. (1996). “Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs.” Technical Report TR 96-036, Dept. of Computer Science, University of Minnesota, Minneapolis. Karypis, G. and V. Kumar. (1998). “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs.” SIAM Journal on Scientific Computing 20(1), 359–392. Karypis, G. and V. Kumar. (1998). “A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering.” Journal of Parallel and Distributed Computing 48(1), 71–95. Kernighan, B.W. and S. Lin. (1970). “An Efficient Heuristic Procedure for Partitioning Graphs.” Bell Systems Technical Journal 49(2), 291–307. Kirkpatrick, S., C.D. Gelatt, and M.P. Vecchi. (1983). “Optimization by Simulated Annealing.” Science 220(4598), 671–680. Klenke, R.H., R.D. Williams, and J.H. Aylor. (1992). “Parallel-Processing Techniques for Automatic Test Pattern Generation.” IEEE Computer 25(1), 71–84. Mobasher, B., N. Jain, E.H. Han, and J. Srivastava. (1996). “Web Mining : Pattern Discovery from World Wide Web Transactions.” Technical Report TR-96-050, Department of computer science, University of Minnesota, Minneapolis. ParMETIS. (2003). http://www-users.cs.umn.edu/∼karypis/metis/parmetis/index.html. URL time: September 1st, 2003, 0020 . Randall, M. and A. Abramson. (1999). “A Parallel Tabu Search Algorithm for Combinatorial Optimisation Problems.” In Proceedings of the 6th Australasian Conference on Parallel and Real Time Systems. Springer-Verlag, pp. 68–79. Schloegel, K., G. Karypis, and V. Kumar. (2000). “Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning.” In Proceedings of 6th International Euro-Par Conference. Springer-Verlag, LNCS 1900, pp. 296– 310. Schloegel, K., G. Karypis, and V. Kumar. (2000). “A Unified Algorithm for Load-balancing Adaptive Scientific Simulations.” In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. Schloegel, K., G. Karypis, and V. Kumar. (2001). “Wavefront Diffusion and LMSR: Algorithms for Dynamic Repartitioning of Adaptive Meshes.” IEEE Transactions on Parallel and Distributed Systems 12(5), 451–466. Shekhar, S. and D.R. DLiu. (1996). “Partitioning Similarity Graphs: A Framework for Declustering Problems.” Information Systems Journal 21(6), 475–496. Simon, H.D. (1991). “Partitioning of Unstructured Problems for Parallel Processing.” Computing Systems in Engineering 2–3(2), 135–148. Soper, A.J., C. Walshaw, and M. Cross. (2000). “A Combined Evolutionary Search and Multilevel Optimisation Approach to Graph Partitioning.” In Proceedings of the Genetic and Evolutionary Computation Conference. pp. 674–681.
336
˜ BANOS ET AL.
Walshaw, C. and M. Cross. (2000). “Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm.” SIAM Journal on Scientific Computing 22(1), 63–80. Walshaw, C. and M. Cross. (2000). “Parallel Optimisation Algorithms for Multilevel Mesh Partitioning.” Parallel Computing 26(12), 1635–1660. Walshaw, C. (2001). “Multilevel Refinement for Combinatorial Optimisation Problems.” Technical Report 01/IM/73, Computing Mathematical Science, University of Greenwich, London.