A Method to Minimize Distributed PSO Algorithm ... - Springer Link

3 downloads 35418 Views 164KB Size Report
PSO algorithm execution time in a grid computer environment, based on a reduction in the ... interchange the best global fitness solution they found at each iteration. Instead of this, we ... Of course, execution time over such a system will be ...
A Method to Minimize Distributed PSO Algorithm Execution Time in Grid Computer Environment F. Parra, S. Garcia Galan, A.J. Yuste, R.P. Prado, and J.E. Mu˜ noz Universidad de Jaen, Dpt. of Telecomunication Engineering, Alfonso X el Sabio, 28 23200 Linares, Spain {fparra,sgalan,ajyuste,rperez,jemunoz}@ujaen.es

Abstract. This paper introduces a method to minimize distributed PSO algorithm execution time in a grid computer environment, based on a reduction in the information interchanged among the demes involved in the process of finding the best global fitness solution. Demes usually interchange the best global fitness solution they found at each iteration. Instead of this, we propose to interchange information only after an specified number of iterations are concluded. By applying this technique, it is possible to get a very significant execution time decrease without any loss of solution quality.

1

Introduction

The Particle Swarm Optimization (PSO) is an evolutionary computational technique developed by Dr. Eberhart and Dr. Kennedy in 1995 to resolve optimization problems. It was based on the simulation of the social behavior of birds within a flock [1],[2]. The initial concept of PSO was to graphically simulate a bird flock choreography, with the aim of discovering patterns that govern the birds ability of flying synchronously, and changing the direction with a regrouping in an optimal formation. From this initial objective, the concept evolved toward an efficient optimization algorithm. It is possible to implement this algorithm over a monoprocess system or over a grid system (this article’s environment). Of course, execution time over such a system will be reduced by a factor that depends on the computational power at each node and on the method used to implement the distributed algorithm [4], [7], [9]. In this paper, the particles’s population will be uniformly distributed over n processes to create n demes. Each of them executes the standard PSO algorithm (over a reduced population). Each deme on the grid needs to communicate to its neighbors the best local fitness in order to get a total best global fitness. This communication may happen at each iteration over the algorithm or may happen each k iterations. In this paper, this second approach is explored in order to test the possibility of getting, at least, the same results as in the first approach, but reducing the total computational time. J. Mira et al. (Eds.): IWINAC 2009, Part II, LNCS 5602, pp. 478–487, 2009. c Springer-Verlag Berlin Heidelberg 2009 

A Method to Minimize Distributed PSO Algorithm Execution Time

479

The rest of this paper is organized as follows: The fundamental background of the standard PSO algorithm and a taxonomy of the different existing ways to implement the algorithm over parallel computer systems are described in Section 2. In section 3, a new schema to implement the parallel algorithm in order to reduce the total execution time based on minimizing the information flow among the different demes is shown. In Section 4 the test simulation on the effectiveness of the proposed algorithm against the standard algorithm is presented, in particular, improvement over the total execution time is shown. Conclusions are drawn in Section 5.

2 2.1

Background The Standard PSO Algorithm

In the original concept of Dr. Eberhard and Dr. Kenney [3], each single solution in the n-dimensional search space is a ”bird” or ”particle”. Each solution or particle has a fitness value that is evaluated by a fitness function. The objective of the algorithm is to find the solution that maximize (or minimize) the fitness function. The system is initialized with a population of M particles called swarm with a random uniform distribution in the n-dimensional search space RN . Assume the fitness function to be a real function f whose domain is the ndimensional search space: (1) f : RN → R The M particles will be represented by its position x and velocity v in the space xi ∈ RN ∀i ∈ {1, 2, . . . , M }

(2)

vi ∈ RN ∀i ∈ {1, 2, . . . , M }

(3)

The algorithm is iterative; in this way, the position and velocity for the ith particle at the kth iteration will be xki and vki . To calculate the (k + 1)th iteration: (k+1)

vi

= wvki + c1 r1 (Pbki − xki ) + c2 r2 (Gbk − xki ) (k+1)

xi

= xki + vi

(k+1)

(4) (5)

where ◦ r1 and r2 are real random variables with a normal distribution in the interval (0, 1) ◦ c1 and c2 are real learning factors; usually both have the same value = 2 ◦ Pbki is the best position found for the i-particle until the k-iteration ◦ Gbk is the best position for the swarm. Best position means that it constitutes the position whose fitness solution is the maximum (or minimum)one ◦ w is an scalar factor representing the inertial weight that controls the maximum particle’s velocity [1],[8] As such, the system evolves to find the best solution and the PSO algorithm finishes when it reaches a maximal number of iterations or in case the best particle position can not be improved further after a sufficient number of iterations.

480

2.2

F. Parra et al.

Distributed PSO Algorithm

PSO algorithm involves large computational resources depending on the fitness function, the particle population and the total number of iterations the system needs to find the optimal solution. In such a kind of algorithms it is possible to implement parallelization mechanisms that overcome such complexity dividing the main task into subtasks that run concurrently over a computer cluster or grid. We can consider four types of paradigms in order to implement a distributed algorithm: master-slave, also called global parallelization, island model, also called coarse-grained parallelization, diffusion model, also called fine-grained parallelization and, at last, hybrid model. May we explain them in more detail [6], [9], [10]: ◦ Master-Slave model: The tasks (fitness computation, archiving, etc.) are distributed among the slave nodes and the master takes care about other activities such a synchronization, distributing workload etc. ◦ Island model: The total particle population is separated into several subpopulations or demes, where each subpopulation looks for a particular solution independently. At each n iteration, where n is an integer greater than zero, the best solution migrates across subpopulations. The communications topology among nodes is an important issue in this model and the communication cost depends on it [5]. ◦ Diffusion model: It is a paradigm very similar to master-slave mode but each node only holds a few particles; at limit, only one. In this model, the communication cost is very high and it strongly depends on the chosen topology. Its very suitable for massive parallel computing machines. ◦ Hybrid model: It combines the previous paradigms.

3

Proposed Schema

In this paper, an hybrid model will be used (synchronous master-slave and island in a complete mesh topology, as shown in figure 1). The server executes the main algorithm, whose function consists in distributing the tasks to each peer or slave and, at the end of the execution, retrieving all the calculated results independently for each slave, and processing them to get the final results. The total population M is divided into K demes (figure 2), where K is the processes number. Assuming that each node is identical, for an uniform distribution, each deme holds M/K particles. Obviously, a synchronization mechanism is necessary to migrate the best solutions across the processes. This mechanism could be implemented in several different ways: ◦ Synchronization at each iteration: It is the standard PSO parallel algorithm. At every iteration, each process gets its local best solution, stops and waits for all the other processes to finish. Then, all of them share theirs own best solution, compare them to its own and select the best of all. After that,

A Method to Minimize Distributed PSO Algorithm Execution Time

481

Fig. 1. Complete mesh topology

Fig. 2. Mesh demes interconnection

the process who owns the best solution, shares the best position with all the others. Of course, this mechanism guarantees that every process gets the same global best position Gbk and best solution. After that, each process restars the execution. The communication cost will be CC = αH, where H is the total iterations number and α is a proportional constant. ◦ Synchronization each n iterations: It is the proposed PSO parallel algorithm. The synchronization mechanism starts every k iterations instead of each

482

F. Parra et al.

one. Therefore total communications cost will be reduced by the k factor. Let’s CC  be the communication cost on this second approach CC  = αH/k. Then CC  = CC/k The first case is nothing more than a complete parallelization of the classical PSO algorithm, so we could expect to find identical quality in the results, as in the monoprocess case, except for an improvement in the total execution time. Nevertheless, in the second case a modification to the algorithm itself that could affect the quality in the results is introduced . In this paper, we will compare both methods, using several well established benchmark functions in the multidimensional search space (table 1). Table 1. Benchmark Functions Equation  f1 = D {100(xi + 1 − x2i )2 + (xi − 1)2 } i=1 √ D f2 = i=1 xi sin ( xi ) D 2 f3 = i=1 {xi − 10 cos (2πxi ) + 10}

3.1

Name

D

Bounds

Generalized Rosenbrock 30 (−30, 30)D Generalized Schwefel 2.6 30 (0, 500)D Generalized Rastrigin 30 (−5.12, 5.12)D

Simulation Scenarios

Figure 3 shows the proposed architecture schema, based on the Matlab Parallel Computing Toolbox. There exists three main components or modules: ◦ Client:Used to define the job and its task. It usually works over a user’s desktop. ◦ Job Manager/Scheduler:It is the part of the server software that coordinates the executions of the job and the evaluation of their tasks. It can work over the same desktop as the Client does. ◦ Worker:It executes the tasks. It works over the cluster of machines dedicated to that purpose. A job is a large operation to complete and can be broken into segments called tasks. Each task is executed in a Worker.

4

Experimental Results

The simulation runs over a grid system composed of five computers, each equipped with an Intel monoprocessor, at 1Ghz clock and 1 Gbyte RAM memory, connected to a 100Mbps ethernet local area network. The grid system is configured as follows: four computers act as workers so each one executes the same algorithm, but over a different subpopulation or deme. The fifth computer acts as a Job Manager and Client. To get the comparative, the parallel algorithm is implemented as two different versions:

A Method to Minimize Distributed PSO Algorithm Execution Time

483

Fig. 3. Schematic Parallel Architecture

Standard Parallel Algorithm ◦ 500 iterations ◦ 100 particles ◦ All the processes are synchronized at each iteration to obtain the best global fitness ◦ A total of 30 experiments are executed for each benchmark function Proposed parallel algorithm ◦ 500 iterations ◦ 100 particles ◦ All the processes are synchronized at each n iteration to obtain the best global fitness ◦ A total of 30 experiments are executed for each benchmark function Table 1 shows the three well known benchmark functions used [1]. Each particle is a real vector in the D-dimensional search space RD where D = 30. The bounds are the limits for the search space. The quality results (fitness accuracy) and the time gain are tested for the proposed algorithm with respect to the standard algorithm. 4.1

Quality Results

Figures 4, 5 and 6 represents the evolution of the best fitness calculated over a total of 500 iterations for the three benchmark functions: Generalized Rosenbrock, generalized Rastrigin and generalized Schwefel 2.6 for three different conditions, where each value in the Y axis represents the mean over 30 experiment. To get the graphics, we have choosen the up (k = 50) and bottom (k = 1) limits and two intermediate values (K = 10, 25) for the k variable.

F. Parra et al. 9

2.5

x 10

Synchro at each iteration 2

Synchro at each 50 iterations

Best Fitness

Synchro at each 25 iterations Synchro at each 10 iterations 1.5

1

0.5 0

50

100

150

200

250

300

350

400

450

500

Function Evaluactions

Fig. 4. Best Fitness for the Generalized Rosenbrock Benchmark Function 1300

1200

Best Fitness

1100

1000

Synchro at each iteration Synchro at each 50 iterations Synchro at each 25 iterations

900

Synchro at each 10 iterations 800

700

600 0

50

100

150

200

250

300

350

400

450

500

Function Evaluations

Fig. 5. Best Fitness for the Generalized Rastrigin Benchmark Function 14000

12000

10000

Best Fitness

484

Synchro at each iteration Synchro at each 50 iterations 8000

Synchro at each 25 iterations Synchro at each 10 iterations

6000

4000

2000 0

50

100

150

200

250

300

350

400

450

500

Function Evaluations

Fig. 6. Best Fitness for the Generalized Schwefel 2.6 Benchmark Function

A Method to Minimize Distributed PSO Algorithm Execution Time

485

◦ Synchro at each iteration: It is the standard algorithm. All the processes (each of them represents a deme) are synchronized at each iteration to get the global fitness solution among the best local ones. ◦ Synchro every 10 iterations: The processes are synchronized only when each of them completes 10 iterations. ◦ Synchro every 25 iterations: In this case, the processes are synchronized only when each of them completes 25 iterations. ◦ Synchro every 50 iterations: The same case, but every 50 iterations The four cases converge to the same solution (of course, in the cases 2, 3 y 4 we obtain a function with steps at every k iterations). 4.2

Timing

It could be expected to get a time gain in case the demes interchange information among them every k iterations instead of at each one. In order to experimentally test this question, assume the following experiment: Execute the proposed algorithm 30 times over the architecture for each k, where k varies from 0 to 50 for the three benchmark functions and calculate the mean execution time. This way we obtain a time graphic figure, figure 7. The Y axis represents the mean execution time (over 30 experiments) per each process or deme and the X axis represents the number of iterations at which the demes synchronizes to get the global best fitness. Defining the Relative Gain as: RG = 100(1 − Tk /T )

(6)

Where T is the mean execution time for the standard parallel PSO algorithm and Tk is the mean execution time for the proposed algorithm with synchro each k iterations. As a result table 2 is obtained, where the execution time for the previous experiment for each benchmark functions, the mean and the relative rain with respect to the standard algorithm are represented.

Execution time in seconds (per process)

4.5 4 3.5

Generalized Rastrigin Generalized Schwefel 2.6

3

Generalized Rosenbrock 2.5 2 1.5 1 0.5 0 0

5

10

15

20

25

30

Numbers of iterations to synchronize

Fig. 7. Timing

35

40

45

50

486

F. Parra et al.

Table 2. Execution time comparative among the three benchmark functions, mean value and relative gain k 1 5 10 15 20 25 30 35 40 45 50

RaTk 3.9970 1.1080 0.7162 0.5849 0.5308 0.4792 0.4563 0.4386 0.4344 0.4193 0.4193

STk 4.1940 1.2870 0.9109 0.7802 0.7328 0.6771 0.6516 0.6615 0.6287 0.6177 0.6104

RoTk 4.0150 1.0330 0.6729 0.5427 0.4703 0.4349 0.4182 0.3974 0.3818 0.3729 0.3703

M ean 4.0687 1.1427 0.7667 0.6359 0.5780 0.5304 0.5087 0.4992 0.4816 0.4700 0.4667

RG 0.00% 71.92% 81.16% 84.37% 85.79% 86.96% 87.50% 87.73% 88.16% 88.45% 88.53%

Mean Relative Gain (%)

100 80 60 40 20 0 0

5

10

15

20

25

30

Numbers of iterations to synchronize

35

40

45

50

Fig. 8. Mean Relative Gain Function

5

Conclusion

As we can notice in figures 4, 5 and 6, for all the three benchmark fitness functions, simulations with k = 10, 25, 50 reach the same final results as for k = 1 (Standard Parallel PSO Algorithm). For these cases where k = 1, the graphics correspond to step functions, because of the representation of the best global result for each k value. The number of iterations to get the convergence for the fitness functions depends on the kind of function. In our case, for the generalized Rosenbrock function the simulations converge after 470 iterations; for the generalized Rastrigin the convergence is reached after 220 iterations and for the generalized Schwefel 2.6 the convergence is reached after 70 iterations. Nevertheless, all the simulations reach the same results, so the proposed algorithm gets, at least, the same results as the standard PSO. The proposed algorithm could be of interest if a time execution gain with respect to the standard is reached. Figure 7 shows a representation of the execution time (in seconds) as a function of k. It can be noticed that the execution time depends, of course, on the kind of benchmark fitness function used in the

A Method to Minimize Distributed PSO Algorithm Execution Time

487

experiments. In any case, the time gain graphics shapes are very similar. As it can be observe in table 2, the proposed algorithm gets a time execution gain that increases with k from 0, for k = 1, to 88.53% for k = 50. Figure 8 shows the main relative gain, in a graphical manner.

References 1. Bratton, D., Kennedy, J.: Defining a Standard for Particle Swarm Optimization. In: Proceedings of the 2007 IEEE Swarm Intelligence Symposium (SIS 2007) (2007) 2. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, Perth, Australia, pp. 1942–1948. IEEE Service Center, Pistcataway (1995) 3. Kennedy, J., Eberhart, R.: Swarm Intelligence. Morgan Kaufmann Publisher, San Francisco (2001) 4. Kennedy, J.: Stereotyping: improving particle swarm performance with cluster analysis. In: Proceedins of the IEEE International Conference on Evolutionary Computation, pp. 1507–1512 (2000) 5. Kennedy, J., Mendes, R.: Neighborhood topologies in fully informed and best-ofneighbothood particle swarms. IEE Transations on Systems, Man and Cybernetics, Part C: Applications and Reviews 36(4), 515–519 (2006) 6. Liu, D.S., Tan, K.C., Ho, W.K.: A Distributed Co-evolutionary Particle Swarm Optimization Algorithm. In: 2007 IEEE Congress on Evolutionary Computation (CEC 2007) (2007) 7. Guha, T., Ludwig, S.A.: Comparison of Service Selection Algorithms for Grid Services: Multiple Objetive Particle Swarm Optimization and Constraint Satisfaction Based Service Selection. In: Proceedings - International Conference on Tools with Artificial Intelligence (ICTAI 1, art. no. 4669686), pp. 172–179 (2008) 8. Jiao, B., Lian, Z., Gu, X.: A dinamic inertia weight particle swarm optimization algorithm. Chaos, Solitons and Fractals 37, 698–705 (2008) 9. Scriven, I., Lewis, A., Ireland, D., Junwei, L.: Decentralised Distributed Multiple Objective Particle Swarm Optimisation Using Peer to Peer Networks. In: IEEE Congress on Evolutionary Computation, CEC 2008, art. no. 4631191, pp. 2925– 2928 (2008) 10. Burak Atat, S., Gazi, V.: Decentralized Asynchronous Particle Swarm Optimization. In: IEEE Swarm Intelligence Symposium, SIS 2008, art. no. 4668304 (2008)

Suggest Documents