Parallel Computing 30 (2004) 785–801 www.elsevier.com/locate/parco
Solving the mesh-partitioning problem with an ant-colony algorithm q Peter Korosec a
a,*
a, Borut Robic , Jurij Silc
b
Computer Systems Department, Jozef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia b Faculty of Computer and Information Science, University of Ljubljana, Trzaska 25, SI-1000 Ljubljana, Slovenia Received 10 November 2003; accepted 15 December 2003 Available online 14 May 2004
Abstract Many real-world engineering problems can be expressed in terms of partial differential equations and solved by using the finite-element method, which is usually parallelised, i.e. the mesh is divided among several processors. To achieve high parallel efficiency it is important that the mesh is partitioned in such a way that workloads are well balanced and interprocessor communication is minimised. In this paper we present an enhancement of a technique that uses a nature-inspired metaheuristic approach to achieve higher-quality partitions. The so-called multilevel ant-colony algorithm, which is a relatively new metaheuristic search technique for solving optimisation problems, was applied and studied, and the possible parallelisation of this algorithm is discussed. The multilevel ant-colony algorithm performed very well and is superior to classical k-METIS and Chaco algorithms; it is even comparable with the combined evolutionary/multilevel scheme used in the JOSTLE evolutionary algorithm and returned solutions that are better than the currently available solutions in the Graph Partitioning Archive. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Finite-element method; Mesh partitioning; Ant-colony optimisation; Algorithms
q
A preliminary version of this work was presented at the Sixth International Workshop on Nature Inspired Distributed Computing, Nice, France, April 2003. * Corresponding author. Tel.: +386-1-477-33-63; fax: +386-1-477-38-82. E-mail addresses:
[email protected] (P. Korosec),
[email protected] (J. Silc),
[email protected] (B. Robic). URLs: http://csd.ijs.si/korosec, http://csd.ijs.si/silc, http://lalg.fri.uni-lj.si/~borut/. 0167-8191/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.parco.2003.12.016
786
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
1. Introduction Various real-world engineering problems that arise in mechanical, civil, automobile and aerospace engineering can be expressed in terms of partial differential equations and solved by using the finite-element method. If a partial differential equation involves a function f ðxÞ, then the purpose of the finite-element method is to determine an approximation to f ðxÞ. To do this the domain is discretised into a set of geometrical elements consisting of nodes. This process is known as meshing. The value of f ðxÞ is then computed for each of these nodes, and the solutions for the other points are interpolated from these values [6]. In real-world engineering problems, however, this is a rather demanding task since meshes often have large numbers of elements (a mesh can have hundreds of thousands of nodes). As a result, the finite-element method is usually parallelised, i.e. the mesh is divided among several processors. To achieve high parallel efficiency it is important that the mesh is partitioned in such a way that workloads are well balanced and interprocessor communication is minimised. An important component in mesh partitioning is the well-known graph-partitioning problem. Unfortunately, this is an NP -hard optimisation problem, which makes it impossible to find optimum solutions in polynomial time, unless P ¼ NP . Consequently, heuristic approaches are normally used for mesh partitioning. Many mesh-partitioning algorithms have been described in the literature [10,26]. However, some of these just use geometric information about the mesh and employ a fast, parallel, sorting algorithm in their implementation [27]. Spectral partitioning methods [24] also make use of the mesh-connectivity information and solve an eigenvalue problem to compute the partition. A promising alternative is to use stochastic heuristics, some of which are based on various fundamental principles observed in nature. A number of studies reported in the last few years have shown that such techniques have great potential for solving a wide range of problems [36]. Examples include tabu search [17], simulated annealing [31], neural networks [1], genetic algorithms [34], etc. These methods are widely applicable and have proven to be very powerful in practice. Ant colonies [5,8] are yet another such natural phenomena that have recently given rise to new optimisation methods belonging to the group of ‘‘Heuristics from Nature’’. These methods have already been used in solving various combinatorial optimisation problems, such as the travelling salesman [9], quadratic assignment [12], job-shop scheduling [32], vehicle [23] and network routing [28]. For the case of mesh partitioning, however, only a few attempts have been made [21,22]. In this paper we present a new ant-colony-based algorithm that outperforms several classical mesh-partitioning tools, such as Chaco, JOSTLE, and k-METIS. The rest of the paper is organised as follows: in Section 2 we describe the meshpartitioning problem and introduce the terminology used throughout the paper. Section 3 addresses the ant-colony optimisation metaheuristic. In Section 4 we introduce a multilevel ant-colony algorithm for solving the mesh-partitioning problem. The experimental results are discussed in Section 5. A possible parallelisation of our algorithm is given in Section 6, which is followed by the conclusion.
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
787
2. The mesh-partitioning problem Because of its complexity we will put the mesh-generation problem [16,35] aside in this paper, and mainly concentrate on the partitioning itself. The problem for a nonoverlapping partition of an unstructured mesh can be formulated as a graph-partitioning problem. The mesh is associated with a graph GðV ; EÞ that consists of vertices together with the edges that connect them, where every vertex v has weight 1 and corresponds to an element of the mesh. An edge between two vertices v and u indicates that the two corresponding elements are neighbours. There are several criteria for determining whether two elements of a mesh are neighbours [4]. The criterion used in this paper is as follows: two elements of a mesh are considered to be neighbours if they share a common side (in the case of a 3D mesh) or a common edge (in the case of a 2D mesh) (Fig. 1).
2.1. Problem definition Let GðV ; EÞ be an undirected graph consisting of a non-empty set V of vertices and a set E V V of edges. A k-partition D of G comprises k mutually disjointed subsets D1 ; D2 ; . . . ; Dk (domains) of V whose union is V . The set of edges that connect the different domains of a partition D is called an edge-cut. A partition D is balanced
(a)
(b)
(c)
(d)
Fig. 1. Mesh partitioning: (a) sample mesh; (b) mesh with induced graph; (c) graph partitioning; (d) partitioned mesh.
788
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
if the sizes of the domains are roughly the same, i.e. if bðDÞ ¼ max1 6 i 6 k jDi j min1 6 i 6 k jDi j 0. The graph-partitioning problem is to find a balanced partition with a minimum edge-cut, denoted by fðDÞ. 2.2. Problem complexity Assume that we need to find the k-partition of GðV ; EÞ. Let us write n ¼ jV j and n suppose that n ¼pk for some integer p. There are p ways to choose the first np domain, ways to choose the second domain, etc. Hence, there are p n np 2p p 1 ... different k-partitions of G. This expression may repn! p p p p resent a very large number. For example, in a graph with 80 vertices there can be as many as 1061 balanced eight-partitions. Using Stirling’s formula it is easy to see that this expression is bounded below by Xðk n Þ. A systematic search for the best k-partition is, therefore, not practical. On the other hand, there is an upper bound on the time complexity of the k-partitioning problem. Indeed, the problem is known to be NP -hard [13]. With this we have shown that mesh partitioning is also NP -hard. Since the space for feasible solutions of the graph-partitioning problem is prohibitively large, we are forced to recourse to heuristic approaches, which reduce or bound the space to be searched. 3. Ant-colony optimisation Social insects – ants, bees, termites, and wasps – exhibit a collective problem-solving ability [7]. In particular, several ant species are capable of selecting the shortest pathway from among a set of alternative pathways that lead from their nest to a food source. Ants deploy a chemical trail (or pheromone trail) as they walk; this trail attracts other ants to take the path that has the most pheromone. This reinforcement process results in the selection of the shortest path: the first ants coming back to the nest are those that took the shortest path twice (from the nest to the source and beck to the nest), so that more pheromone is present on the shortest path than on the longer paths immediately after these ants have returned, stimulating nestmates to choose the shortest path (Fig. 2). Taking advantage of this ant-based optimising principle combined with pheromone evaporation to avoid early convergence to bad solutions, Colorni et al. [5] have proposed a remarkable optimisation method, called ant-colony optimisation, which they applied to classical NP -hard combinatorial optimisation problems. This optimisation is based on the indirect communication of a colony of simple agents, called (artificial) ants, mediated by (artificial) pheromone trails. It implements a randomised construction heuristic that makes probabilistic decisions as a function of artificial pheromone trails, which are determined by the pheromone intensity and the heuristic information based on the input data of the problem to be solved, if they are available.
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
789
Fig. 2. The double-bridge experiment: (a) ants start exploring the double bridge; (b) eventually most of the ants choose the shortest path; (c) distribution of the percentage of ants that selected the shorter path (after Goss et al. [14]).
Current applications of ant-colony algorithms fall into the two important problem classes consisting of static and dynamic combinatorial optimisation problems. Static problems are those whose topology and cost do not change while the problems are being solved (e.g. the travelling-salesman problem), while in dynamic problems the topology and costs can change while the solutions are being built (e.g. network routing). From a high-level perspective these two classes are very similar, though there are many differences in the details. 3.1. Metaheuristics The artificial ants used in the ant-colony algorithm are stochastic constructive procedures that build solutions with the help of probability. While constructing solutions they use heuristic information about the problem as well as the pheromone trails that are being dynamically changed. The main concept of the metaheuristic (Fig. 3) is foraging and gathering food. This involves moving the ant through the graph from the starting vertex to one of the ending vertices. More specifically, ant k in step t moves from vertex i to vertex j with a probability given by: 8 b < P saij ðtÞgij j 2 Ni;k a sil ðtÞgbil pij;k ðtÞ ¼ : l2Ni;k 0 j 62 Ni;k
790
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
Fig. 3. An ant-colony optimisation metaheuristic.
where gij is a priori available heuristic information, a and b are two parameters that determine the relative influence of the pheromone trail sij ðtÞ and heuristic information, respectively, and Ni;k is the feasible neighbourhood of vertex i. If a ¼ 0, then only heuristic information is considered. Similarly, if b ¼ 0, then only pheromone information is at work. Once an ant builds a solution, or while a solution is being built, the pheromone is being deposited (on nodes or connections) according to the evaluation of a (partial) solution. The solution construction ends when an ant comes to the ending vertex (where food is located). Besides the main concept of foraging and gathering food, we also have two activities called pheromone-trail evaporation and daemon action (the latter is optional). Pheromone-trail evaporation is a procedure that simulates the reduction of pheromone intensity. It is needed in order to avoid a too quick convergence of the algorithm to a sub-optimal solution. Daemon actions can be used to implement centralised actions that cannot be performed by ants. As we can see from the pseudo code, the Schedule_Activities construct does not specify how the three included activities should be scheduled or synchronised. This means it is up to the programmer to specify how these procedures will interact (parallel or independent).
4. The multilevel ant-colony algorithm 4.1. The basic ant-colony algorithm The main idea of the ant-colony algorithm for k-way partitioning is very simple and was recently proposed by Langham and Grant [22]: We have k colonies of ants that are competing for food, which in this case represents the vertices of the graph. At the end the ants gather food to their nests, i.e. they partition the mesh into k submeshes. More precisely, the algorithm proceeds as follows. First we map the graph onto the grid, which represents the ants’ habitat (a place where the ants can move). There are many possibilities as to how a graph can be mapped, but for our example we will consider a random mapping. Ants are placed into their nest locus, from where they start their foraging and gathering of food.
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
791
Fig. 4. The basic ant-colony algorithm.
The ants on the grid can move in three directions (forward, left and right). The decision as to which direction an ant will move is made by using the probability of movement. A cumulative probability distribution is used to decide which direction is to be chosen. When an ant tries to move off the grid it is forced to move left or right with equal probability. When an ant finds food it tries to pick it up. First, it checks whether the quantity of the temporarily gathered food in its nest is at the limit (the capacity of storage is limited due to the problem’s constraints). If the limit has not been reached, then the weight of the food is calculated from the number of cut edges created by assigning the selected vertex to the partition associated with the nest of the current ant; otherwise the ant moves in a randomly selected direction. If the food is too heavy for one ant to pick it up (and not too heavy for a few ants to lift it up) then an ant sends a help signal within a radius of a few cells. So, if other ants are in the neighbourhood, they will help this ant to carry the food to the nest locus. On the way back to the nest locus an ant deposits pheromones on the trail that it is making, so the other ants can follow its trail and gather more food from that, or a nearby, cell. When an ant reaches the nest locus it drops the food in the first possible place around the nest (in a clockwise direction). After an ant has dropped its food it starts a new round of foraging. Of course ants can also gather food from other nests. When an ant tries to pick up food from other nests it performs the same procedure as when foraging for food, with the exception that when the food is too heavy to be picked up, the ant moves on instead of sending a help signal. In this way we significantly improve our temporary solution. As we have already mentioned, there are some constraints that are imposed on our algorithm. The first is the colony’s storage-capacity constraint, which is implemented so that no single colony can gather all the food into its nest and to maintain
792
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
the appropriate balance between domains. The second constraint ensures that when the pheromone intensity of a certain cell drops below a fixed value, that cell’s pheromone intensity is restored to the initial value. With this we maintain a high exploration level. Other constraints are as follows: there can only be a limited number of vertices put in a single cell; each ant can carry only a limited number of pieces of food; the food that is being brought back to the nest is a kind of a tabu, i.e. it is not available to other ants. A short tabu list consisting of the last m pieces of food that were moved helps the algorithm to escape from local minima. We have made several experiments to see how successful the ant-colony algorithm is at solving the graph-partitioning problem [20]. During the experiments we noticed that the algorithm only performs well on smaller graphs ðn < 500Þ. As a result of this we enhanced our basic ant-colony algorithm with a multilevel technique [33]. 4.2. The multilevel paradigm One already-established way to speed up and globally improve the partitioning method is the use of multilevel techniques. Here, the basic idea is to group vertices together to form clusters that define a new graph. The next step is to recursively iterate this procedure until the graph size falls below a certain threshold. This is followed by a successive refinement of these coarser graphs. This procedure is known as the multilevel paradigm. The multilevel idea (see Fig. 5) was first proposed by Barnard and Simon [3] as a method of speeding-up spectral bisection, and improved by Hendrickson and Leland [15] who generalised it to encompass local refinement algorithms. 4.2.1. Implementation The implementation consists of two parts: graph contraction and partition expansion. In the first part we create a coarser graph G‘þ1 ðV‘þ1 ; E‘þ1 Þ from G‘ ðV‘ ; E‘ Þ by finding the largest independent subset of graph edges and then collapsing them. Each selected edge is collapsed and the vertices u1 ; u2 2 V‘ that are at either end of it are merged into the new vertex v 2 V‘þ1 with weight jvj ¼ ju1 j þ ju2 j. The edges that have not been collapsed are inherited by the new graph G‘þ1 and the edges that have become duplicated are merged and their weight summed. Because of inheritance the total weight of the graph remains the same and the total edge weight is reduced by an amount equal to the weight of the collapsed edges, which have no impact on the graph balance or the edge cut. In the second part we expand the alreadyoptimised partition (with the ant-colony algorithm) of graph G‘ . The optimised partition must be interpolated onto its parent graph G‘1 . Because of the simplicity of the coarsening in the first part, the interpolation itself is trivial. So, if vertex v 2 V‘ belongs to domain Di , then after refinement the matched pair u1 ; u2 2 V‘1 that represents v, will also be in Di . In this way we expand the graph to its original size, and on every level ‘ of our expansion we run our basic ant-colony algorithm. We refer to this as the multilevel ant-colony algorithm (MACA) approach. Due to the large graphs and the increased number of levels, the number of vertices in a single cell increases rapidly. To overcome this problem we introduced a method
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
793
Fig. 5. The three phases of multilevel k-way graph partitioning.
called bucket sort that accelerates and improves the algorithm’s convergence by choosing the most ‘‘promising’’ vertex from the cell. 4.3. Bucket sorting The bucket sort, which was first introduced by Fiduccia and Mattheyses [11], has become an essential tool for the efficient and rapid sorting and adjustment of vertices in terms of their gain. The basic idea is that all the vertices with a particular gain g are put together in a ‘‘bucket’’ ranked g. In this way the problem of finding a vertex with maximum gain is converted into finding the non-empty bucket with the highest rank and then picking a vertex from it. If a chosen vertex migrates from one domain to another, only its gain and the gains of all its neighbours have to be recalculated and put back into appropriate buckets. In our implementation each bucket is represented by a double-linked list of vertices. Because of the multilevel process, it often happens that the potential gain values are dispersed over a wide range. For this reason we have introduced the 2–3 tree. With this we avoided large and sparse arrays of pointers. We store the non-empty buckets in the 2–3 tree, so each leaf in the tree represents a bucket. For even faster searching we have made one 2–3 tree for each colony on every cell that has vertices
794
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
-1
2
grid with food (vertices) -3 -7
0 -3
-1
6
4 0
4
2
6
bucket ranked 6
double linked list of vertices
Fig. 6. Bucket sorting.
on it (see Fig. 6). With this we have increase the speed of the search, add, and delete operations. 5. The performance of the multilevel ant-colony algorithm In this section we present and discuss the results of the experimental evaluation of the MACA in comparison to other well-known partitioning programs.
Fig. 7. MACA mesh-partitioning visualisation.
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
795
Fig. 8. MACA user interface: (a) initial stage; (b) final stage.
5.1. Experimental environment The MACA was implemented in Borlandâ DelphiTM . The experiments were made on a computer with an AMD AthlonTM XP 1800+ processor running the Microsoftâ Windowsâ XP operating system.
796
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
The implementation also includes a visualisation tool to assist the user in selecting the appropriate parameters of the algorithm (see Figs. 7 and 8). 5.2. Benchmark suite The benchmark graphs used in our experiment were taken from the Graph Partitioning Archive (www.gre.ac.uk/~c.walshaw/partition/) and are described in Table 1. 5.3. Results We partitioned each of the graphs into two and four domains (k ¼ 2, k ¼ 4). Each score is described with the edge cut fðDÞ and the balance bðDÞ. Balance is defined as the difference (in the number of vertices) between the largest and the smallest domain. We decided to restrict the balance with bðDÞ 6 0:002 jVk j. We ran the MACA procedure 30 times on each graph; and from the solutions obtained we chose the one with minimum fðDÞ as the final result for that graph. The results of our experiment are shown in Tables 2 and 3, where our algorithm is compared with the k-METIS 4.0 [18], the Chaco 2.0 [15], and the new mixed simulated annealing and tabu search algorithm MLSATS [2]. It is clear that the MACA performed very well. Notice that our MACA is superior to the classical k-METIS and Chaco algorithms. Notice also that MLSATS produced some results that have better fðDÞ but with much higher bðDÞ than our MACA. The MACA also returned some solutions that are better than currently available (end of summer 2003) solutions in the Graph Partitioning Archive (Table 4). Furthermore, the MACA is even comparable to the combined evolutionary/multilevel
Table 1 Benchmark graphs Graph GðV ; EÞ
jV j
jEj
add20 data 3elt uk add32 bcsstk33 whitaker3 crack wing_nodal fe_4elt2 4elt fe_sphere cti cs4
2395 2851 4720 4824 4960 8738 9800 10 240 10 937 11 143 15 606 16 386 16 840 22 499
7462 15 093 13 722 6837 9462 291 583 28 989 30 380 75 488 32 818 45 878 49 152 48 232 43 858
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
797
Table 2 Experimental results ðk ¼ 2Þ Graph
add20 data 3elt uk add32 bcsstk33 whitaker3 crack wing_nodal fe_4elt2 4elt fe_sphere cti cs4
k-METIS 4.0
Chaco 2.0
MLSATS
MACA
fðDÞ
bðDÞ
fðDÞ
bðDÞ
fðDÞ
bðDÞ
fðDÞ
bðDÞ
774 253 148 33 12 13 393 135 221 1892 132 149 444 401 397
23 67 72 126 62 139 96 248 315 297 104 418 300 653
630 210 124 23 11 10 199 135 209 1747 130 242 422 369 418
1 1 0 0 0 0 0 0 1 1 0 0 0 1
696 196 87 54 10 10 064 130 184 1670 130 138 384 318 418
119 131 108 186 2 100 0 62 543 1 300 576 120 907
601 199 90 20 10 10 222 126 184 1711 130 139 402 332 397
5 1 0 2 2 12 16 0 23 1 8 4 24 1
Table 3 Experimental results ðk ¼ 4Þ Graph
add20 data 3elt uk add32 bcsstk33 whitaker3 crack wing_nodal fe_4elt2 4elt fe_sphere cti cs4
k-METIS 4.0
Chaco 2.0
MLSATS
MACA
fðDÞ
bðDÞ
fðDÞ
bðDÞ
fðDÞ
bðDÞ
fðDÞ
bðDÞ
1214 486 250 50 38 22 909 407 478 3898 400 385 828 1104 1102
45 42 63 47 59 129 97 106 163 87 197 46 156 312
1242 444 258 60 53 25 529 439 457 3817 378 416 835 1000 1135
1 1 0 0 0 1 0 0 1 1 1 1 0 1
1193 395 199 261 35 22 442 448 401 3596 350 367 786 966 1103
57 67 45 74 66 219 198 216 273 113 369 329 360 403
1196 424 225 50 40 22 632 386 381 3619 350 389 810 1045 1038
3 1 2 2 5 14 6 8 11 2 13 15 14 21
scheme used in the JOSTLE Evolutionary algorithm [29], which is currently the most promising mesh-partitioning algorithm.
6. Discussion on parallelisation of the multilevel ant-colony algorithm The ant-colony optimisation has a high degree of inherent parallelism. In contrast to the recently proposed parallel implementations [25,30], whereby parallelisation is
798
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
Table 4 Results from graph partitioning archive Graph
k
Algorithm
fðDÞ when bðDÞ 6 0:01 jVk j
add20 add20 data data 3elt 3elt uk uk add32 add32 bcsstk33 bcsstk33 whitaker3 whitaker3 crack crack wing_nodal wing_nodal fe_4elt2 fe_4elt2 4elt 4elt fe_sphere fe_sphere cti cti cs4 cs4
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
MACA MACA GrPart MACA GrPart, MACA JE, MLSATS MACA JE J2.2, MLSATS, MACA JE GrPart iJ JE, MACA JE MACA JE MACA JE MRSB, MLSATS, MACA JE GrPart JE JE N.A. JE, MLSATS, MACA MACA JE JE
601 1186 196 410 89 199 19 44 10 33 10 109 21 685 126 380 183 368 1697 3572 130 349 138 327 386 768 318 963 367 940
GrPart: A. Kozhushkin’s implementation of iterative multilevel Kernighan–Lin. iJ: Iterated JOSTLE–– iterated multilevel Kernighan–Lin (k-way) [33]. J2.2: JOSTLE––multilevel Kernighan–Lin (k-way); version 2.2 [33]. JE: JOSTLE Evolutionary–combined evolutionary/multilevel scheme [29]. MLSATS: MultiLevel refinated mixed Simulated Anealing and Tabu Search [2]. MRSB: Multilevel Recursive Spectral Bisection [3]. MACA: Multilevel Ant-Colony Algorithm.
on the ant level, we proposed parallelisation on the colony level. The idea is shown in Fig. 9. Each colony of ants is assigned its own processor, and each processor executes ant activities as described in the Ant_Colony_Algorithm (see Fig. 4). The only communication needed is for the position of the food on the grid (e.g. when the food is taken from or placed somewhere on the grid). When the partitioning is completed we can then solve the partitioned problem on the existing architecture. Fig. 9 represents a model where MIMD architecture with distributed memory or clusters is being used. Of course the model can be easily used on a shared-memory architecture. The only difference is that the shared information is stored in a global memory, which all processors can access.
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
799
SHARED INFORMATION Food position Best found solution
COLONY PROCESS Reception of the changed food position Ants activities Send new position of the moved food
Colony 1
...
COLONY PROCESS Reception of the changed food position Ants activities Send new position of the moved food
Colony k
Fig. 9. Model for possible MACA parallelisation.
7. Conclusions The graph-partitioning problem is an important component for mesh partitioning in the domain-decomposition method. The ant-colony-optimisation method uses a metaheuristic approach for solving hard combinatorial optimisation problems. The goal of this paper was to investigate an ant-colony algorithm for mesh partitioning, to suggest modifications to improve this algorithm, and to evaluate them experimentally. The experimental results show that the basic ant-colony algorithm performed very well on small or medium-sized graphs ðn < 500Þ. With larger graphs, which are often encountered in mesh partitioning, we had to use a multilevel method to produce results that were competitive with the results given by other algorithms. The MACA is clearly a very promising method that needs to be thoroughly investigated. There are many possibilities for improving our algorithm. One possibility is in the mapping of the graph onto the grid: with a proper mapping convergence the results can be improved. The use of a load-balancing method between levels would also be a very promising way to go. The next possibility is in determining which and how many vertices from the cell will be picked and with what probability. Here, the Kernighan–Lin gain method [19] might be used. We could also add some daemon actions, like the min-cut algorithm, to improve solutions during the crossing from one level to another. And, finally, we could change the way the pheromone is evaporated, deposited and restored. There is a wide range of possibilities to be considered in the future. One of the most appealing is a merger of the MACA with some other method through daemon actions and parallel implementation of the MACA.
800
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
References [1] A. Bahreininejad, B.H.V. Topping, A.I. Khan, Finite element mesh partitioning using neural networks, Advances in Engineering Software 27 (1–2) (1996) 103–115. [2] R. Banos, C. Gil, J. Ortega, F.G. Montoya, Multilevel heuristic algorithm for graph partitioning, in: G.R. Raidl et al. (Eds.), Applications of Evolutionary Computing, Lecture Notes in Computer Science, 2611, Springer, 2003, pp. 143–153. [3] S.T. Barnard, H.D. Simon, A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, Concurrency: Practice and Experience 6 (2) (1994) 101– 117. [4] N. Bouhmala, X. Cai, Partition of unstructured finite element meshes by a multilevel approach, in: T. Sørevik et al. (Eds.), Applied Parallel Computing––New Paradigms for HPC in Industry and Academia, Lecture Notes in Computer Science, 1947, Springer, 2001, pp. 187–195. [5] A. Colorni, M. Dorigo, V. Maniezzo, Distributed optimization by ant colonies, in: Proceedings of the First European Conference on Artificial Life, Paris, France, 1991, pp. 134–142. [6] R.D. Cook, D.S. Malkus, M.E. Plesha, R.J. Witt, Concepts and Applications of Finite Element Analysis, fourth ed., John Wiley & Sons, New York, 2001. [7] J.-L. Deneubourg, S. Goss, Collective paterns and decision making, Ethology Ecology and Evolution 1 (4) (1989) 295–311. [8] M. Dorigo, V. Maniezzo, A. Colorni, The ant system: optimization by a colony of cooperating agents, IEEE Transactions on Systems, Man, and Cybernetics––Part B: Cybernetics 26 (1) (1996) 29– 46. [9] M. Dorigo, L.M. Gambardella, Ant colony system: a cooperative learning approach to the traveling salesman problem, IEEE Transactions on Evolutionary Computation 1 (1) (1997) 53–66. [10] C. Farhat, M. Lesoinne, Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics, International Journal for Numerical Methods in Engineering 36 (5) (1993) 745–764. [11] C.M. Fiduccia, R.M. Mattheyses, A linear time heuristic for improving network partitions, in: Proceedings of the 19th IEEE Design Automation Conference, 1982, pp. 175–181. [12] L.M. Gambardella, E. Taillard, M. Dorigo, Ant colonies for the quadratic assignment problem, Journal of the Operational Research Society 50 (2) (1999) 167–176. [13] M.R. Garey, D.S. Johnson, Computers and Intractability, A Guide to the Theory of NPCompleteness, W.H. Freeman and Company, 1979. [14] S. Goss, S. Aron, J.L. Deneubourg, J.M. Pasteels, Self-organized shortcuts in the Argentine ants, Naturwissenschaften 76 (12) (1989) 579–581. [15] B. Hendrickson, R. Leland, A multilevel algorithm for partitioning graphs, in: Proceedings of the Supercomputing ’95, 1995. [16] K. Ho-Le, Finite element mesh generation methods: a review and classification, Computer Aided Design 20 (1) (1988) 27–38. [17] P. Kadłuczka, K. Wala, Tabu search and genetic algorithms for the generalized graph partitioning problem, Control and Cybernetics 24 (4) (1995) 459–476. [18] G. Karypis, V. Kumar, Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and Distributed Computing 48 (1) (1998) 96–129. [19] B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graph, The Bell System Technical Journal 49 (2) (1970) 291–307. [20] P. Korosec, J. Silc, B. Robic, An ant-colony-optimization approach to the mesh-partitioning problem, in: R. Trobec et al. (Eds.), Parallel Numerics ’02, University of Salzburg and Jozef Stefan Institute, 2002, pp. 123–132. [21] P. Korosec, J. Silc, B. Robic, A multilevel ant-colony-optimization algorithm for mesh partitioning, International Journal of Pure and Applied Mathematics 5 (2) (2003) 143–159. [22] A.E. Langham, P.W. Grant, Using competing ant colonies to solve k-way partitioning problems with foraging and raiding strategies, in: D. Floreano et al. (Eds.), Advances in Artificial Life, Lecture Notes in Computer Science, 1674, Springer, 1999, pp. 621–625.
P. Korosec et al. / Parallel Computing 30 (2004) 785–801
801
[23] R. Montemanni, L.M. Gambardella, A.E. Rizzoli, A.V. Donati, A new algorithm for a dynamic vehicle routing problem based on Ant colony system, in: Proceedings of the Second International Workshop on Freight Transportation and Logistics, Palermo, Italy, 2003. [24] A. Pothen, H.D. Simon, K.P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM Journal on Matrix Analysis and Applications 11 (3) (1990) 430–452. [25] M. Randall, A. Lewis, A parallel implementation of ant colony optimization, Journal of Parallel and Distributed Computing 62 (9) (2002) 1421–1432. [26] K. Schloegel, G. Karypis, V. Kumar, Parallel static and dynamic multi-constraint graph partitioning, Concurrency and Computation-Practice and Experience 14 (3) (2002) 219–240. € [27] M.S. Shephard, J.E. Flaherty, H.L. DeCougny, C. Ozturan, C.L. Bottasso, M.W. Beall, Parallel automated adaptive procedures for unstructured meshes, in: Special Course on Parallel Computing in Computational Fluid Dynamics, AGARD-R-807, AGARD, Neuilly-sur-Seine, France, 1995, pp. 6/ 1–49. [28] K.M. Sim, W.H. Sun, Multiple ant-colony optimization for network routing, in: Proceedings of the First International Symposium on Cyber Worlds, Tokyo, Japan, 2002, pp. 277–281. [29] A.J. Soper, C. Walshaw, M. Cross, A combined evolutionary search and multilevel approach to graph partitioning, in: Proceedings of the Genetic and Evolutionary Computation Conference, Las Vegas, NV, 2000, pp. 674–681. [30] E.-G. Talbi, O. Roux, C. Fonlupt, D. Robillard, Parallel ant colonies for the quadratic assignment problem, Future Generation Computer Systems 17 (4) (2001) 441–449. [31] L. Tao, Y.C. Zhao, K. Thulasiraman, M.N.S. Swamy, Simulated annealing and tabu search algorithms for multiway graph partition, Journal of Circuits, Systems and Computers 2 (2) (1992) 159–185. [32] T. Teich, M. Fischer, A. Vogel, J. Fischer, A new ant colony algorithm for the job shop scheduling problem, in: Proceedings of the Genetic and Evolutionary Computation Conference, San Francisco, CA, 2001, p. 803. [33] C. Walshaw, M. Cross, Mesh partitioning: a multilevel balancing and refinement algorithm, SIAM Journal on Scientific Computing 22 (1) (2001) 63–80. _ [34] J. Zola, R. Wyrzykowski, Application of genetic algorithm for mesh partitioning, in: Proceedings of the Workshop on Parallel Numerics, Bratislava, Slovakia, 2000, pp. 209–217. _ [35] J. Zola, R. Wyrzykowski, J. Silc, G. Papa, Genetic algorithms as a method for finite element mesh smoothing, in: Proceedings of the Third International Conference on Parallel Processing and Applied Mathematics, Kazimierz Dolny, Poland, 1999, pp. 637–644. [36] A.Y. Zomaya, J.A. Anderson, D.B. Foegel, G.J. Milburn, G. Rozenberg, Nonconventional computing paradigms in the new millennium, Computing in Science and Engineering 3 (6) (2001) 82–99.