n ?n

Parallel Simulated Annealing for the Set-partitioning Problem Zbigniew J. Czech Institute of Computer Science, Silesia University of Technology Gliwice, Poland e-mail: [email protected]

Abstract A delivery problem which reduces to the NP-complete set-partitioning problem is investigated. The sequential and parallel simulated annealing algorithms to solve the delivery problem are discussed. The objective is to improve the quality of solutions to the problem by applying parallelism. Key words. Delivery problem, set-partitioning problem, heuristic algorithms, parallel algorithms, parallel simulated annealing

1. Introduction A delivery problem (DP) which arose in a transportation company is investigated. It can be formulated as follows. There is a central depot of goods and n customers (nodes) located at the specified distances from the depot. The goods have to be delivered to each customer by using vehicles. The number of vehicles which can be used for delivery is unlimited, however in each trip which is effected during an eight-hour day a vehicle crew can visit at most k customers, where k is a small constant. Let k = 3. Then on a single trip the crew starts from the depot, visits one, two or three customers and returns to the depot. A set of routes which guarantees the delivery of goods to all customers is sought. Furthermore, the cost defined as the total length of the routes in the set should be minimized. If a strong constraint is imposed on the magnitude of k (e.g. k = 3), then the DP reduces to the set-partitioning problem (SPP) which is NP-complete. Let N = f1, 2, . . . , ng be a set of customers, and let S = fS1 , S2 , . . . , Sq g, q = n1 + n2 + n3 , be a set of all subsets of N of size at most 3, i.e. Si N and jSi j 3, i 2 M , where M = f1, 2, . . . , q g. Every Si represents a possible tour of a solution

? ? ?

This research was supported in part by the State Committee for Scientific Research grant BK-245-RAu2-99.

to the DP. Let ci be a minimum cost (length) of the tour To obtain the solution to the DP we need to solve the SPP which consists in finding the collection fSl g, l 2 M , of minimum total cost such that every customer j , j 2 N , is covered by the subsets in the collection exactly once. In other words, the intersection of any pair of subsets in fSl g is empty. In this work we investigate the sequential and parallel versions of a simulated annealing algorithm to solve the DP. The objective is to improve the quality of solutions to the problem by applying parallelism. Over the last years several approaches to parallelization of simulated annealing have been proposed. For overviews we refer to Aarts and Korst [1], Greening [10], Azencott [3], Boissin and Lutton [5], and Verhoeven and Aarts [17]. To our best knowledge no work was devoted to the the parallel simulated annealing algorithm for solving the delivery problem as defined in this paper. In section 2 a solution space for the DP is discussed. Section 3 describes the sequential annealing algorithm. In section 4 the parallel annealing algorithms are presented. Section 5 concludes the paper.

Si .

2. Solution space In order to find the solution to the DP we have to consider the sets of all possible routes for n customers. Given k = 3 it can be shown that the number of those sets is determined by the expression [6]:

n!

bX n=3c i=0

0 @

1 6i i!

b(n?X 3i)=2c j =0

1

(n ? 3i ? 2j )!2j j !

1 A:

(1)

This means that for the number of customers from n = 40 to n = 100, which we consider, we have to “sieve” from 8 1021 to 1:15 10103 potential solutions, respectively. Surely, it is not possible in practice to search exhaustively through such enormous solution spaces.

In Figures 1 to 4 we show the suspected optimal solutions to the DP for n = 100, 90, 80 and 70 customers. These were obtained by multiple runs of the simulated annealing algorithms.

cost = 1:83535 103 Figure 1. The suspected optimum solutions to the DP for n = 100 uniformly distributed customers in the 60 60 square with the depot in the center (the Euclidean distances between nodes are used).

3. Sequential annealing The technique of simulated annealing which can be regarded as a variant of local search was first introduced by Metropolis et al. [13], and then used to optimization problems by Kirkpatrick et al. [11] and Cˇerny [7]. A comprehensive introduction to the subject can be found in [15]. The application of simulated annealing to solve the DP is as follows. Initially a solution to the problem is the set of routes of size 3 (last route may have less than 3 customers). The customers are grouped into the routes randomly. On every step a neighbor solution is determined by either moving a randomly chosen customer from one route to another (perhaps empty) or by exchanging the places of random customers between their routes. The neighbor solutions of lower costs obtained in this way are always accepted, whereas the solutions of higher costs are accepted with the probability

Pr = Ti =(Ti + )

(2)

where Ti , i = 0, 1, . . . is the temperature of annealing which drops from a value T0 = cost(initial solution)=103 according to the formula Ti+1 = Ti , where < 1. Eq. (2) implies that large increases in solution cost, so called uphill moves, are more likely to be accepted when Ti is high. As Ti approaches zero most uphill moves are rejected. The algorithm stops if equilibrium is encountered. We define that equilibrium is reached if the 20 stages of temperature reduction fail to improve the best solution. Contrary to the classical approach in which a solution to the problem is taken as the last solution obtained in the annealing process, we memorize the best solution found during the whole annealing process (cf. lines 14–17 of the annealing step procedure). Summing up, the annealing algorithm performs the local search by sampling the neighborhood randomly. It attempts to avoid becoming prematurely trapped in a local optimum by sometimes accepting an inferior solution. The level of this acceptance depends on the magnitude of the increase in solution cost and on the search time to date. The cost of a single iteration of the repeat statement (lines 5–11) is proportionate to n2 , as the steps in line 7 and lines 9–10 are executed in constant time, and there is n2 repetitions of the for loop (lines 6–8). It is difficult to establish analytically the number of stages, a, in the cooling schedule, i.e. the number of iterations of the repeat statement. In our experiments we found that for the number of customers n = 40 : : : 100 and = 0:95, the number of stages did not exceed 200. Therefore the worst case time complexity of the sequential annealing algorithm is T (n) 200n2 = O(n2 ). The test results of 100 executions of the sequential annealing algorithm are shown in Table 1. These were obtained for the subsets of nodes of the test set of n = 100 uniformly distributed customers in the 60 60 square with the depot in the center. The columns of the table contain the cardinalities of the subsets (n), the suspected optimum solution value to the DP (Opt.), the best solution value found by the algorithm (Best), the average value of the solutions over 100 executions (Avg.), the standard deviation of the results ( ), the quality of the solution (expressed in per cent) measured by the ratio of the best (approximate) solution value obtained by an algorithm to the optimum (or suspected optimum) value ( ), and the number of times the optimum solution was obtained in 100 executions (H). It can be seen from Table 1 that the probability of finding the optimum solution for larger n is low. For example, the minimum solution to the DP was found only once over 100 executions for the set of n = 90 customers, and it was not found at all for n = 100. Sequential annealing algorithm

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1836.8

Avg. 823.9 987.9 1179.4 1376.8 1515.0 1701.3 1850.6

11.17 11.11 10.31 13.14 20.98 6.20 20.78

100.00 100.00 100.00 100.00 100.00 100.00 100.08

H 49 28 3 6 5 1 0

Table 1. Performance of the sequential annealing algorithm.

7 8

cost = 1:69108 103 Figure 2. The suspected optimum solutions to the DP for n = 90 uniformly distributed customers in the 60 60 square with the depot in the center.

1 2 3 4 5 6 7 8 9 10 11 1 2 3

4 5 6

Create an initial, old solution as the set of random routes of size 3; best solution := old solution; equilibrium counter := 0; fset the equilibrium counterg T := cost(best solution)=1000; finitial temperature of annealingg repeat; for iteration counter := 1 to n2 do annealing step(old solution, best solution); end for; T := T ; ftemperature reductiong equilibrium counter := equilibrium counter + 1; until equilibrium counter > 20; procedure annealing step(old solution, best solution); Select randomly a customer; Select randomly a route (distinct from the customer’s route selected in the previous line) from the set containing the routes of the old solution and an empty route; if the route size is less than 3 then Create the new solution by moving the customer into the chosen route; else

9 10 11 12 13 14 15 16 17 18 19

Select randomly a customer in the route; Create the new solution by exchanging the selected customers between their routes; end if; := cost (new solution ) ? cost (old solution ); Generate random x uniformly in the range (0; 1); if ( < 0) or (x < T=(T + )) then old solution := new solution; if cost (new solution ) < cost (best solution ) then best solution := new solution; equilibrium counter := 0; end if; end if; end annealing step;

4. Parallel annealing Let us assume that p identical processors are available and each of them is capable of generating its own sequential annealing sequence. The processors can be used either to speed up the sequential annealing algorithm or to achieve a higher quality of solutions to a problem. In this work we are interested in the latter goal. There are several methods of parallelization of the sequential simulated annealing algorithm [3]. Among them are simultaneous independent searches and simultaneous periodically interacting searches. We shall discuss these approaches in the following subsections.

4.1. Simultaneous independent searches The method of simultaneous independent searches consists in executing p independent annealing sequences and selecting the best solution among the solutions obtained by the processors.



Let E be the set of all solutions to the DP. A solution X 2 E is said to be a global minimum if

cost (X ) = jinf fcost (j )g: 2E Denote by Emin the set of all global minima. Let r be the total available time (expressed in annealing steps), and let Ep = fX r(1); Xr(2); : : : ; Xr(p)g be the set of p best solutions obtained by the processors after having performed r sequential annealing steps. The final solution to the problem is selected as the best solution in set Ep , what we denote

Yr = Xr(1) Xr(2) X r(p): Clearly we have

Pr (Yr 62 Emin) =

Y 1j p

Pr (Xr(j) 62 Emin ):

(3)

The Equation (3) indicates that the simultaneous independent searches improve a quality of the found solution since the probability that Yr does not belong to the set of optimum solutions is less than a corresponding probability for an annealing sequence performed by a single processor. The open question, which we investigate in this work, is how much smaller this probability is.

A simultaneous independent searches algorithm written for a CREW PRAM model of computation is given below. The processors Pj , j = 1, 2, . . . , p, carry out independent annealing searches using the same initial solution and cooling schedule as in the sequential algorithm (see section 3). At each temperature processor Pj executes n2 annealing steps (lines 10–12) and then tests whether to update the best solution with the best local solutionj (lines 13–19). If such an update is made, the variable equilibrium counter is set to 0. The searches stop when equilibrium is reached. The worst case time complexity of the parallel algorithm is Tp (n) a(n2 + p), since the cooling schedule gives at most a iterations of the repeat statement (lines 9–24) and the cost of every iteration is n2 + p. That cost includes n2 iterations of the for statement (lines 10–12) and p updates of the best solution in the shared memory (lines 13–19). Note that the complexity of the parallel algorithm, Tp (n), is only slightly worse (by term p, p 20; end for;

The simultaneous independent searches algorithm was implemented in Ada95 and ran on a single-processor computer Sun Enterprise 450, 250 MHz. In the concurrent implementation, processor Pj performing an annealing sequence was represented by a separate Ada task. The tasks were executed via interleaving. The test results of 100 executions of the concurrent algorithm for p = 2 :: 5 tasks are shown in Table 2. The same data sets as in the sequential experiments were used. The columns have the similar meaning as in Table 1. It can be seen from Table 2 that the quality of solutions obtained increases with the number of tasks. When two or more tasks were executed the quality measure, , achieved 100 % for every n.

a) p = 2

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Avg. 822.0 986.1 1176.7 1370.5 1508.7 1698.5 1842.7

9.18 11.79 7.58 6.52 6.38 5.81 4.16

100.00 100.00 100.00 100.00 100.00 100.00 100.00

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Avg. 819.4 983.5 1176.1 1370.2 1507.4 1696.6 1842.2

1.41 5.66 11.29 11.95 4.88 4.54 3.98

100.00 100.00 100.00 100.00 100.00 100.00 100.00

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Avg. 820.3 983.4 1175.6 1368.0 1505.8 1696.3 1841.2

5.14 4.70 8.97 4.43 3.04 5.51 4.02

100.00 100.00 100.00 100.00 100.00 100.00 100.00

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Avg. 819.7 983.0 1175.1 1367.5 1506.2 1696.3 1841.0

3.49 2.99 5.69 4.18 6.48 4.18 5.34

100.00 100.00 100.00 100.00 100.00 100.00 100.00

H 73 59 8 12 7 7 1

b) p = 3

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

H 85 79 10 15 16 11 1

c) p = 4

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

H 82 75 14 26 21 7 1

d) p = 5

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

H 91 82 20 22 21 13 2

4.2. Simultaneous periodically interacting searches As before, we assume that p identical processors are available and generate their own sequential annealing sequences. In this case, however, the processors interact with each other every s steps passing their best solutions found so far.

Table 2. Performance of the simultaneous independent searches for p = 2 :: 5 tasks.

Suppose for a moment that the temperature T is fixed. (j ) Let Xr (T ), j = 1, 2, . . . , p, r = 1, 2, . . . be the Markov chains for each of the processors, let PT (X ) be the realization of one step of the chain at temperature T and with r(j ) be the best solutions found by starting point X , and let X processors j = 1, 2, . . . , p so far, i.e. between step 1 and r. We assume the following scheme of interaction:

Xr(1)+1 j) Xr(+1 Xis(j) Xis(j)

=

PT (Xr(1)); PT (Xr(j) ) for j 6= 1; and if r + 1 6= is; PT (Xis(j?) 1 ) if cost (PT (Xis(j?) 1 ))

=

Xis(j?1)

= =

4 5 6 7 8 9 10 11

cost (Xis(j?1) );

otherwise:

In this scheme the processors interact at steps is, i = 1, 2, . . . where each step consists in a single realization of the Markov chain. The chain for the first processor (j = 1) is completely independent. The chain for the second processor is updated at steps is to the better solution between the (1) , and the realbest solution found by the first processor, X is (2) ization of the last step of the second processor, PT (Xis?1 ). Similarly, the third processor chooses as the next point in (2) and PT (X (3) ). its chain the better solution between X is is?1 Clearly, the best solution found by the l-th processor is propagated for further exploration to processes m, m > l. Now note that the temperature of annealing decreases according to the formula Ti+1 = Ti for i = 0, 1, 2, . . . There are two possibilities in establishing the points in which the temperature drops and the processors interact. Namely, we may assume that the processors interact frequently during each of the temperature plateaus, or that the temperature drops several times before an interaction takes place. In this paper the former approach is adopted. The simultaneous periodically interacting searches algorithm written for a CREW PRAM model is shown below. The computations are similar to those in the simultaneous independent searches algorithm. However now the tasks interact every n steps in order to pass their best solutions. The worst case time complexity of the algorithm is Tp (n) a(n2 +(p?1)n2 +p). As compared to the previous algorithm the complexity is higher by a cost of communication. At each execution of the repeat statement there are n points of communication (cf. lines 12–21). The cost of a single communication is proportionate to (p ? 1)n, as p ? 1 messages of lenght O(n) are sent and received. Simultaneous periodically interacting searches 1 2

3

parfor Pj , j = 1, 2, . . . , p do if j = 1 then

12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Create an initial, best solution as the set of random routes of size 3; equilibrium counter := 0; fset the equilibrium counterg T := cost(best solution)=1000; finitial temperature of annealingg end if; old solutionj := best solution; best local solutionj := best solution; repeat; for iteration counterj := 1 to n2 do annealing step(old solutionj , best local solutionj ); fas in the sequential algorithmg if number of annealing steps executed so far is a multiplicity of n then finteractg if j = 1 then send best local solution1 to processor P2 ; else fj > 1g receive best local solutionj ?1 from processor Pj ?1 ; if cost(best local solutionj ?1 ) < cost(best local solutionj ) then best local solutionj := best local solutionj ?1 ; end if; if j < p then send best local solutionj to process Pj +1 ; end if; end if; end if; end for; lock(best solution); lock(equilibrium counter); if cost (best local solution j ) < cost (best solution ) then best solution := best local solutionj ; equilibrium counter := 0; end if; unlock(best solution); unlock(equilibrium counter); if j = 1 then T := T ; ftemperature reductiong equilibrium counter := equilibrium counter + 1; end if; until equilibrium counter > 20; end for;

The simultaneous periodically interacting searches algorithm was implemented in Ada95 and ran for p = 2 :: 5 tasks. The test results of 100 executions of the concurrent algorithm are shown in Table 3. It can be seen that the simultaneous periodically inter-

acting searches algorithm for p 4 gives lower number of hits into the minimum for n = 40 :: 80 and higher number of hits for n = 90 :: 100 (Table 3) as compared to the corresponding values for the simultaneous independent searches algorithm (Table 2). Thus we conclude that the interaction of tasks improve the search in larger solution spaces. In [3] C. Graffigne investigated a slightly different scheme of simultaneous periodically interacting searches. She conjectured that under some assumptions there is no need to add interaction between processors before last interaction. Her conjecture was supported by some experimental observations. The results from Table 3 show that this conjecture does not hold for our scheme of interaction.

5. Conclusions As compared to the sequential algorithm the application of parallelism increases the quality of obtained solutions to the DP. For number of tasks p 3 both the simultaneous independent searches and the simultaneous periodically interacting searches achieved the maximum quality measure

= 100%. This means that for 100 executions of the algorithms the optimum solutions to the DP was found at least once. The experimental results show that the simultaneous periodically interacting searches exhibit better performance in terms of finding the optimum solution for larger solution spaces comparing with the simultaneous independent searches.

References [1] E. H. L. Aarts and J. H. M. Korst. Simulated Annealing and Boltzmann Machines. J. Wiley, Chichester, 1989. [2] E. H. L. Aarts and P. J. M. van Laarhoven. Simulated Annealing: Theory and Applications. J. Wiley, New York, 1987. [3] R. Azencott. Parallel simulated annealing: An overview of basic techniques. In R. Azencott, editor, Simulated annealing. Parallelization Techniques, pages 37–46. J. Wiley, New York, 1992. [4] R. Azencott and C. Graffigne. Parallel annealing by periodically interacting multiple searches: Acceleration rates. In R. Azencott, editor, Simulated Annealing. Parallelization Techniques, pages 81–90. J. Wiley, New York, 1992. [5] N. Boissin and J. L. Lutton. A parallel simulated annealing algorithm. Parallel Computing, 19:859–872, 1993. [6] S. Cicho´nski. An application of simulated annealing to the set-partitioning problem. Technical report, Silesia University of Technology, 1998. (in print). [7] V. Cˇerny. A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm. J. of Optimization Theory and Applic., 45:41–55, 1985. [8] F. Glover and M. Laguna. Tabu search. In C. R. Reeves, editor, Modern Heuristic Techniques for Combinatorial Problems, pages 70–150. McGraw-Hill, London, 1995.

a) p = 2

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Best 819.0 982.2 1173.4 1364.7 1502.9 1693.5 1839.1

Avg. 821.6 986.5 1181.1 1374.1 1508.3 1708.3 1857.7

7.35 11.33 18.55 20.44 7.77 8.77 10.61

100.00 100.00 100.00 100.00 100.00 100.15 100.21

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Avg. 821.3 987.1 1178.4 1375.4 1508.8 1699.1 1845.9

6.46 10.20 15.53 28.16 9.46 10.35 16.34

100.00 100.00 100.00 100.00 100.00 100.00 100.00

Best 819.0 982.2 1173.4 1364.7 1502.9 1692.3 1835.4

Avg. 822.6 986.8 1177.0 1370.6 1510.2 1697.6 1842.7

9.17 12.89 10.20 10.59 27.47 7.90 13.24

100.00 100.00 100.00 100.00 100.00 100.00 100.00

Best 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Avg. 822.1 988.0 1176.8 1374.0 1509.0 1697.2 1843.4

8.51 15.97 9.68 22.83 13.14 8.38 13.29

100.00 100.00 100.00 100.00 100.00 100.00 100.00

H 67 57 7 14 17 0 0

b) p = 3

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

H 70 65 13 16 22 7 1

c) p = 4

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

H 62 64 16 22 26 10 8

d) p = 5

n 40 50 60 70 80 90 100

Opt. 819.0 982.2 1173.4 1364.7 1502.9 1691.1 1835.4

Table 3. Performance of the simultaneous periodically interacting searches for p = 2 :: 5 tasks.

H 71 69 17 28 20 16 7

[9] C. Graffigne. Parallel annealing by periodically interacting multiple searches: An experimental study. In R. Azencott, editor, Simulated Annealing. Parallelization Techniques, pages 47–79. J. Wiley, New York, 1992. [10] D. R. Greening. Parallel simulated annealing techniques. Physica D, 42:293–306, 1990. [11] S. Kirkpatrick, C. D. Gellat, and M. P. Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983. [12] R. Korf. Optimal path-finding algorithms. In L. Kanal and V. Kumar, editors, Search in Artificial Intelligence, pages 223–267. Springer-Verlag, 1988. [13] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculation by fast computing machines. Journ. of Chem. Phys., 21:1087–1091, 1953. [14] J. Pearl. Heuristics. Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, Mass., 1984. [15] C. R. Reeves, editor. Modern Heuristic Techniques for Combinatorial Problems. McGraw-Hill, London, 1995. [16] M. M. Syslo, N. Deo, and J. S. Kowalik. Discrete optimization algorithms with Pascal programs. Prentice-Hall, 1983. [17] M. G. A. Verhoeven and E. H. L. Aarts. Parallel local search techniques. Journal of Heuristics, 1:43–65, 1996.