Available online at www.sciencedirect.com
ScienceDirect Procedia Engineering 192 (2017) 313 – 317
TRANSCOM 2017: International scientific conference on sustainable, modern and safe transport
Parallel genetic algorithm for capacitated p-median problem Miloš Herdaa,* a
University of Žilina, Faculty of Management Science and Informatics, Univerzitná 1, 010 26 Žilina, Slovak Republic
Abstract This paper presents an implementation of a specific genetic algorithm on a high performance computing cluster. The algorithm is designed to approximately solve the capacitated p-median problem. Since the p-median problem has been proven to be NP-hard, exact algorithms are not efficient in solving it in reasonable time. Therefore it is helpful to use metaheuristics like genetic algorithm. In an endeavor to obtain the best solution, even for large instances, we look for best ways how to use all computing power that is in our disposal. The obvious method to achieve that is to design parallel algorithm and let it run on a high performance computing cluster. ©2017 2017Published The Authors. Published by Elsevier Ltd. © by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license Peer-review under responsibility of the scientific committee of TRANSCOM 2017: International scientific conference on (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review responsibility of the scientific committee of TRANSCOM 2017: International scientific conference on sustainable, sustainable,under modern and safe transport. modern and safe transport Keywords: Capacitated p-median problem; Genetic algorithm; Parellel computing; HPC cluster; Heuristic
1. Introduction The capacitated p-median problem was originally formulated to mathematically describe problems especially from public sector. It describes situations when we want to locate a given number of certain facilities that provide service throughout the region. Good example is an emergency service like described in [1]. We have a limited number of stations that we can locate and it is important, that they are located close to people that could need their help. Such a facility has also a limited capacity of people that it can safely provide for, so near places with big population, there need to be more stations of emergency service. If we had an ability to solve the capacitated p-median problem exactly for any given number of customers and facilities it could save great amount of resources or even lives. But,
* Corresponding author. Tel.: +421 902 486 987 E-mail address:
[email protected]
1877-7058 © 2017 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientific committee of TRANSCOM 2017: International scientific conference on sustainable, modern and safe transport
doi:10.1016/j.proeng.2017.06.054
314
Miloš Herda / Procedia Engineering 192 (2017) 313 – 317
unfortunately, the p-median problem has been proven to be NP-hard [2]. That means if we want to obtain an optimal solution for larger instances (e.g. thousands of customers and hundreds of facilities) it would take a ridiculously great amount of time (years or centuries) even with most efficient exact algorithm and high performance computers. For practical use we do not always need the optimal solution. It is sufficient to know the approximation to the optimal solution if it is provided within reasonable time. Such solutions are calculated by methods called heuristics or metaheuristics. These methods work much faster for the cost of optimality. It can occur that they provide an optimal solution to a problem, but that is only a coincidence and by using these methods we cannot tell, if obtained solution is optimal or not. 1.1. Mathematical model of capacitated p-median problem Now we describe the problem more precisely using formal terms of mathematical integer programing. Let there be a valuated graph ܩሺܸǡ ܧሻ of vertexes and edges. Customers in region are represented by vertexes ݒ ܸ א. Every customer is a candidate for locating a facility. Demand of specific customer is represented by evaluation of corresponding vertex. Connections between customers are represented by edges ݁ ܧ אand their evaluations stand for distances between customers they connect. To find an optimal solution for the capacitated p-median problem is to mark exactly p vertexes as medians and then assign to them all other vertexes so the sum of distances between vertex and their assigned median is minimal. Capacities have to be considered during the assignment process. The one way how to construct a mathematical model of capacitated p-median is as follows. Let there be: V – set of vertexes in graph (set of customers). n – number of vertexes in graph, ȁܸȁ ൌ ݊. p – number medians (number of facilities that need to be located). ܿ – demand of vertex ݒ , where ݅ ൌ ͳǡ ʹǡ ǥǡ ݊Ǥ ݀ – the shortest distance between vertexes ݒ and ݒ , where ݅ǡ ݆ ൌ ͳǡ ʹǡ ǥ ǡ ݊Ǥ ݇ – capacity of facility located in ݒ . (Usually the capacity of all facilities is the same.) Next we define variables as follows: ݔ ൌ ͳ if there is facility located in vertex ݒ , ݔ ൌ Ͳ otherwise. ݕ ൌ ͳ if there is demand of vertex ݒ satisfied from facility located in vertex ݒ , ݕ ൌ Ͳ otherwise. The capacitated p-median problem is then defined by following model: minimize
݀ ݕ ሺͳሻ ୀଵ ୀଵ
subject to
ݕ ൌ ͳ݅ ൌ ͳǡ ʹǡ ǥ ǡ ݊ሺʹሻ ୀଵ
ݔ ൌ ሺ͵ሻ ୀଵ
ݕ ݔ ݅ǡ ݆ ൌ ͳǡ ʹǡ ǥ ǡ ݊ሺͶሻ
ܿ ݕ ݇ ݆ ൌ ͳǡ ʹǡ ǥ ǡ ݊ሺͷሻ ୀଵ
ݔ ǡ ݕ אሼͲǡͳሽ݅ǡ ݆ ൌ ͳǡ ʹǡ ǥ ǡ ݊ሺሻ The objective function (1) sums distances between customers and their assigned facilities. Constraints (2) ensure that every customer is assigned only to one facility. Constraint (3) stands for locating exactly p facilities. Constraints
Miloš Herda / Procedia Engineering 192 (2017) 313 – 317
(4) are saying that it is not possible to satisfy a customer from place where no facility was placed. Next constraints (5) ensure that capacities of given facilities are respected. And at last obligatory constraints (6) are presenting the definition domain of variables. 1.2. Genetic algorithm One of good working methods to solve a capacitated p-median problem is genetic algorithm. It is one of so called evolutionary algorithms that are inspired by processes accruing in nature. The main feature is that it is not trying to improve only one solution, but it works with whole population of suitable solutions and through principles of natural selection is the population evolving. Good solutions in the population are allowed to reproduce and combine their good attributes with other good solutions in order to create even better individuals and solutions with bad objective function die out. In short, the iteration process of genetic algorithm goes like this: On the beginning, there is randomly generated an initial population of suitable solutions. To every solution is assigned a so called fitness value (in most cases it is the same as the objective function). Then there are applied so called genetic operations like crossover or mutation to create new individuals. They replace individuals with worse fitness in the population and the process repeats until a certain ending criterion is met. Every individual is represented by a vector which is sort of a genetic code and all relevant information about a solutions is uniquely encoded in it. Crossover genetic operator is most common to be used. Input for crossover are two individuals which are more likely to be selected if they have better fitness. During the crossover, individuals exchange part of their genetic information and the output are two children each with different combined segments from original individuals. Another important operator is mutation, that is not so frequent in the iterative process, but it helps to escape from local minimum if the population gets to homogeneous. Other methods for capacitated p-median problem are described in [3] or [4]. 2. Proposed parallel genetic algorithm In order to make the calculation as fast as possible we decided to implement genetic algorithm in parallel way on a high performance computing (HPC) cluster. But the main goal is to get results very close to optimum within reasonable time. In our previous work [5], [6] we already experimented with sequential genetic algorithm and parallel genetic algorithm for one PC with multiprocessor using openMP protocols. Building on our previous experience we took the next step and proposed parallel algorithm using MPI protocols that would be executed on a HPC cluster. In this paper we are not going much into details of various segments of the algorithm. Instead we will focus on the main scheme explaining how the parallelism is implemented. 2.1. GAklas05 First, we tried to take the known genetic algorithm and run it isolated on the cluster for an hour. It is actually the same as if we would run it on one PC several times. We did this because we wanted to compare these results with more sophisticated parallel algorithm to find out to what degree is it useful to construct more cooperative parallel algorithms. The brief scheme of the proposed genetic algorithm we labeled as GAklas05 goes like this: On every computational node there is generated a random initial population of 100 individuals. Every node has a different randomness and populations are isolated within the computational node. On the beginning an exchange heuristic is applied on few individuals to produce some good staring solutions. Then, the evolutionary iterative process begins. If the iterative process gets stuck on a local minimum (the best found solution was not updated for more than 600 iterations) the population gets restarted with different random generated individuals. This process repeats until one hour is passed. At the end, every computational node applies the exchange heuristic on their best found solution in attempt to improve it even more. And at last, all best found solutions from isolated populations are collected in one node where they are compared and the best one is yielded as final solution.
315
316
Miloš Herda / Procedia Engineering 192 (2017) 313 – 317
2.2. GAklas1 In next step we proposed a parallel genetic algorithm GAklas1 where computing nodes were communicating with each other and the final solution is then result of cooperation. The genetic part of this algorithm is basically the same as in the previous case and the rest goes as follows: At the initiation one of the computing nodes is marked as the master node that is managing the communication with all other worker nodes. This master node executes several tasks. One its thread receives solutions from other nodes in regular intervals. Other free threads of the master node are then applying the exchange heuristic to these collected solutions in order to improve them. And then these improved solutions are then distributed among other worker nodes. Worker nodes begin every with different initial population and during the evolutionary process every 100th iteration they send the best found solution to the master node a every 600th iteration they receive some improved solution from the master node that could originate from a different worker. After an hour of computing the algorithm stops and the master node yields the final solution. 3. Experimental results Algorithm GAklas05 and GAklas1 were lunched on the HPC cluster of University of Zilina with this configuration: 46x Compute node, 2 x 6 core Intel(R) Xeon(R) CPU L5640 @ 2.27GHz, 96GB RAM, OS Scientific Linux 6.3. Algorithms were performed using only 17 compute nodes because the cluster is available for more researchers from the university that ran their experiments on portion of the cluster nodes. The algorithm GAklas05 had then 17 isolated populations and the algorithm GAklas1 had one master node and 16 worker nodes. Algorithm were tested on tree instances sjc proposed in [7]. For these instances we already know the optimal solutions that are possible to get from nowadays exact solvers based on branch and bound method within several minutes. For every instance programs were lunched ten times. Results are then summarized in table 1. Table 1. Experimental results Inst.
n
p
GAklas05 Average
GAklas05 Best
GAklas1 Average
GAklas1 Best
sjc3b
300
30
1.1315% ± 0.0967%
0.8121%
0.1692% ± 0.0623%
0.0000%
sjc4a
402
30
1.7838% ± 0.0929%
1.5569%
0.3435% ± 0.1377%
0.0329%
sjc4b
402
40
1.2927% ± 0.0890%
1.1268%
0.2794% ± 0.0620%
0.0716%
In the first column there is name of used instance. In next two columns there is number of customers n and number of medians p for given instance. Another columns describe the performance of algorithms for given instance. In columns labeled Average is a confidence interval of percentage difference between objective functions of optimal solution and values that yield the algorithm from ten trials. The confidence level is stated at 95%. We can read from there the average value and how stable obtained solutions are. In column labeled Best is percentage difference between objective functions of optimal solutions and the best solutions that has been obtained from algorithm in ten trials. In all cases we can see, that the algorithm GAklas1 is working better. The difference between optimal and average solution is always under 0.5% and the best found solutions were even under 0.1% in one case we were able to find an optimal solution. Although results from algorithm GAklas05 are also quite acceptable, there are much worse in compare to GAklas1. Looking at the confident intervals, we cannot say which algorithm is generally more stable and reliable, meaning there is not big difference between individual obtained solutions. 4. Conclusions In this paper we proposed two parallel genetic algorithms for capacitated p-median problem and implemented it on a HPC cluster. Algorithm GAklas05 had isolated populations and in the algorithm GAklas1 were compute nodes communicating with each other. When we compared results gained from these algorithms we came to a conclusion that the algorithm GAklas1 has much better performance even though a part of its computational time was consumed
Miloš Herda / Procedia Engineering 192 (2017) 313 – 317
by communication operations. The average difference between solutions of these algorithms was 1.14% which is for this kind of problems quite significant. Therefore it is useful to try make compute nodes cooperate with each other in order to get better solutions. In compare to approaches used in [8] or [9] we were not able to reach as good results as they, but we confirmed a principle that in parallel programing it is more convenient to use sophisticated methods for little price of more frequent communication instead of brute computing force of HPC cluster. Based on this knowledge, we intend to continue our research and it is our goal to propose a parallel genetic algorithm that could rival best known methods both in quality of solutions as well as computational time. Acknowledgements This work was supported by the research grants VEGA 1/0518/15 “Resilient rescue systems with uncertain accessibility of service” and APVV-15-0179 „ Reliability of emergency systems on infrastructure with uncertain functionality of critical elements“. References [1] M. Kvet, Designing of large emergency service systems using IP-solvers, in: TRANSCOM 2015: 11-th European conference of young researchers and scientists : Zilina, June 22-24, 2015, Slovak Republic. Section 1: Transport and communications technology. - Zilina: University of Zilina, 2015. - ISBN 978-80-554-1043-2. - CD-ROM, s. 75-80. [2] M.R. Gary, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completenes, San Francisco: Freeman, 1955. [3] I. Landa-Torres, J. Del Ser, S. Salcedo-Sanz, S. Gil-Lopez, J.A. Portilla-Figueras, O. Alonso-Garrido, A comparative study of two hybrid grouping evolutionary techniques for capacitated P-median problem, in: Computers & Operations Reserch 39 (2012) 2214-2222, 2011 Elsevier Ltd. [4] M.E. Amrani, Y. Benadada, B. Gendron, Generalization of capacitated p-median location problem: Modeling and resolution, in: 2016 3rd International Conference on Logistics Operations Management (GOL): 2016 May 23-25, Fez, Moroco, IEEE, ISBN: 978-1-4673-8571-8. [5] M. Herda, Combined genetic algorithm for capacitated p-median problem, in: CINTI 2015 : 16th IEEE international symposium on Computational intelligence and informatics : 2015 November 19-21, Budapest : proceedings. - [S.l.]: IEEE, 2015. - ISBN 978-1-4673-8520-6. - S. 151-154. [6] M. Herda, Parallel genetic algorithm for capacitated p-median problem using openMP protocol, in: CINTI 2016 : 17th IEEE international symposium on Computational intelligence and informatics : 2016 November 17-19, Budapest : proceedings. - [S.l.]: IEEE, 2016. - ISBN 9781-5090-3909-8. - s. 347-351. [7] L.A.N. Lorena, E.L.F. Senne, A column generation approach to capacitated p-median problems, in: Computers & Operations Research 31 (6), 2004, pp. 863-876. [8] F. Stefanelo, O.C.B. de Araújo, F.M. Muller, Matheuristics for the capacitated p-median problem, in: International Transaction in Operational Reaserch, revised form: in Op. Res. 22(2015), DOI: 10.1111/itor.12103 s. 149-167. [9] L. Janosikova, M. Haviar, Imperialist competitive algorithm in combinatorial optimization, in: Quantitative methods in economics : multiple criteria decision making XVIII : proceedings of the international sientific conference : 25th-27th May 2016, Vrátna, Slovakia. - [S.l.]: Letra Interactive, 2016. - ISBN 978-80-972328-0-1. - S. 196-201.
317