High performance parallel evolutionary algorithm ...

290

Int. J. Computer Applications in Technology, Vol. 46, No. 3, 2013

High performance parallel evolutionary algorithm model based on MapReduce framework Xin Du*, Youcong Ni, Zhiqiang Yao and Ruliang Xiao Faculty of Software, Fujian Normal University, Fuzhou, Fujian, 350108, China E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author

Datong Xie Department of Information Management Engineering, Fujian Commercial College, Fuzhou, Fujian, 350012,China E-mail: [email protected] Abstract: Evolutionary algorithms (EAs) are increasingly being applied to large-scale problems. MapReduce is a powerful abstraction proposed by Google for making scalable and fault tolerant applications. However, how to design high performance parallel EA based on MapReduce (MR-PEA) is still an open problem. In this paper, a parallel evolutionary algorithm model based on MapReduce by improving traditional parallel evolutionary algorithms model is proposed. The MR-PEA model is fit for large populations and datasets, has the characteristic of high scalable and efficiency. In order to justify the effectiveness of the MR-PEA model, we proposed a parallel gene expression programming based on MapReduce (MR-GEP) used to solve symbolic regression. Keywords: evolutionary algorithms; EAs; parallel evolutionary algorithms; MapReduce. Reference to this paper should be made as follows: Du, X., Ni, Y., Yao, Z., Xiao, R. and Xie, D. (2013) ‘High performance parallel evolutionary algorithm model based on MapReduce framework’, Int. J. Computer Applications in Technology, Vol. 46, No. 3, pp.290–295. Biographical notes: Xin Du received her PhD in Computer Software and Theory from Wuhan University in 2010. She is currently an Associate Professor at the Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China. Her current research interest includes parallel algorithm, evolutionary algorithm and cloud computing. Youcong Ni is a Lecturer at the Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China. He received his PhD in Computer Software and Theory from Wuhan University in 2010. His current research focuses on cloud computing, parallel algorithm and software engineering. Zhiqiang Yao is a Professor at the Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China. His current research interest includes cloud computing, document engineering. Ruliang Xiao received his PhD in Computer Software and Theory from University of Wuhan in 2007. He is currently a Professor at the Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China. His current research interest includes parallel algorithm, and social network. Datong Xie is a PhD student in Wuhan University. He is currently a Lecturer at the Department of Information Management Engineering, Fujian Commercial College, Fuzhou, Fujian, China. His current research interest includes evolutionary algorithm and parallel algorithm.

Copyright © 2013 Inderscience Enterprises Ltd.

High performance parallel evolutionary algorithm model based on MapReduce framework

1

Introduction

Parallel computing and evolutionary computation are also modern technology for complex problems optimisation, the combination of the two can not only reduce the run time, but also improve the quality of the solution. What is more, it can also increase the computing power (Xu and Zeng, 2005; Kaariainen and Valimaki, 2011; Yuan et al., 2012; Dzafic et al., 2012). In recent years, application of parallel evolutionary algorithms (EAs) has made many valuable results, people have used the message passing programming (MPI) framework to design and implement multiple parallel EA model, such as master-slave, coarse-grained, fine-grained and hybrid. However, the MPI framework lacks sufficient flexibility. It is difficult to deal with many issues, such as heterogeneous, load balancing and fault tolerance. With the increase of problem space and the size of the data handled by parallel EA, the cost of process communication and the data amount of message will sharply rise, and at the same time, the shortcomings of the MPI framework will become more apparent, which will directly affect the performance improvement of the algorithm. However, MapReduce programming framework (Dean and Ghemawat, 2008, 2010; Lammel, 2008) provides a possible way to solve these problems. MapReduce is proposed by Google for easily harnessing a large number of resources in data centres to process data-intensive applications and has been proposed to form the basis of a ‘data centre computer’. This model allows users to benefit from advanced features of distributed computing without worrying about the difficulty of coordinating the execution of parallel tasks in distributed environments. The MapReduce model provides a parallel design pattern for simplifying application developments in distributed environments. This model can split a large problem space into small pieces and automatically parallelise the execution of small tasks on the smaller space. Now it has become a mainstream programming model of the cloud computing era. However, MapReduce framework and MPI framework has significant difference. It is difficult to use directly the outcome of parallel EAs. So it still has many problems and challenges for designing high performance MR-PEA algorithm. For example, studying and designing the MR-PEA model when considering how to take full advantage of the parallelism of the MapReduce framework, fully tap the inherent parallelism of EAs, and consider many factors, such as the population size and data size processed, so that it can be applied in solving different problem classes. In this paper, we propose a high-performance MR-PEA model, which weigh the advantages and disadvantages of various existing parallel evolutionary computational model based on the characteristic of map and reduce function. In MR-PEA model, we proposed is a general parallel EA model. In this model, many operations, such as fitness computation, crossover, mutation and selection have been paralleled. The remainder of this paper is organised as follows. Section 2 reviews some works on traditional PEA and

291

MR-PEA. Section 3 discusses the MR-PEA model we proposed. Section 4 gives an example which using gene expression programming (GEP) based on MapReduce framework we proposed to solve symbolic regression problem. Section 5 concludes the paper with pointers to future work.

2

Related works

Parallelising EAs have received much attention from researchers and a large number of valuable research results on parallel EAs have been made. The major works in this area are as follows: Kang et al. (2001) proposed a new and efficient asynchronous parallel EA for function optimisation using PVM as a parallel development platform. Du et al. (2010) proposed a new asynchronous distributed parallel GEP based on estimation of distribution algorithm (EDA) based on message passing interface (MPI) environment, which improves searching ability and reduces computation time. The experimental results showed that it may approach linear speedup and has better ability to find optimal solution and higher stability than sequential algorithm. Sun et al. (2007) proposed chaotic parallel genetic algorithm (PGA) with feedback mechanism. In this new algorithm, chaotic mapping is embedded for maintaining a good diversity of population; and Baldwin effect-based posterior reinforcement learning, which can successfully deal with the feedback information from the evolutionary system, is integrated to speedup the evolution along the right direction. Traditional parallel EA has been applied in many fields including data mining, function model, time series prediction and combination optimisation and route layout. Dong and Li (2002) proposed the concept of evolve simulation optimisation (ESO) and its formal language description. And according to the character of simulation optimisation, two kinds of parallel models for ESOs: Single population master/slaver models, grain-granulated multi-population island models are provided and realised based on PVM parallel development platform. Villegas (2007) developed a PGA optimisation tool for the synthesis of arbitrarily shaped beam coverage areas using planar 2D phased-array antennas. Yussof et al. (2009) proposed a coarse-grained PGA for solving the shortest path routing problem with the aim to reduce its computation time based on an MPI cluster. And the effect of migration on the proposed algorithm and the performance of the algorithm as compared to its serial counterpart were studied. Tsai et al. (2011) presented a parallel elite genetic algorithm (PEGA) and it was applied to global path planning for autonomous mobile robots navigating in structured environments. Simulations and experimental results are conducted to show the merit of the proposed PEGA path planner and smoother for global path planning of autonomous mobile robots. Tantar et al. (2007) proposed a bicriterion parallel hybrid genetic algorithm (GA) in order to efficiently deal with the problem using the computational grid. In this paper, two molecular complexes: the tryptophan-cage protein (Brookhaven Protein Data Bank

292

X. Du et al.

ID 1L2Y) and – cyclodextrin were considered. The experimentation results obtained on a computational grid show the effectiveness of the approach. Sarkar et al. (2011) proposed a rule-based knowledge discovery model, combining C4.5 (a decision tree-based rule inductive algorithm) and a new PGA based on the idea of massive parallelism. At present, MapReduce has become the mainstream programming framework of cloud computing platform, parallel EA based on the MapReduce framework is to become a research hotspot. The research in this area mainly focuses on the improvement of MapReduce framework in order to make it more suitable for parallel EAs, and typical parallel EA. Jin et al. (2008) presented an extension to the MapReduce model featuring a hierarchical reduction phase. This model is called MRPGA (MapReduce for parallel GAs), which can automatically parallelise GAs. However, there existed several shortcomings: Firstly, the map function performs the fitness evaluation and the ‘ReduceReduce’ does the local and global selection. However, the bulk of the work – mutation, crossover, evaluation of the convergence criteria and scheduling is carried out by a single coordinator. Secondly, the ‘extension’ that they propose can readily be implemented within the traditional MapReduce model. The local reduce is equivalent to and can be implemented within a combiner. Finally, in their mapper, reducer and final reducer functions, they emit ‘default key’ and 1 as their values. Thus, they do not use any characteristic of the MapReduce model – the grouping by keys or the shuffling. The mappers and reducers might as well be independently executing processes only communicating with the coordinator. Mcnabb et al. (2007) presented MapReduce particle swarm optimisation (MRPSO), a particle swarm optimisation (PSO) implementation based on the MapReduce parallel programming model. Zhou (2010) designed parallelisation of the differential evolution algorithm applying MapReduce, however, only the fitness evaluation is parallelised in this method. In order to utilise the recent multi-core processor efficiently, Tagawa and Ishimizu (2010) proposed a concurrent implementation of DE named concurrent DE (CDE) based on ‘MapReduce’. Verma et al. (2009) proposed a PGAs and parallel compact genetic algorithms (CGAs) based on MapReduce. Furthermore, Verma et al. (2010) proposed parallel compact and extended CGAs based on two parallel platform: Hadoop and MongoDB. Results show that both are good choices to deal with large-scale problems. Li and Peng (2011) proposed a double-fitness genetic algorithm (DFGA) for the programming framework of cloud computing. Through this algorithm, the better task scheduling not only shortens

total-task-completion time and also has shorter averagecompletion time. Xiong et al. (2011) presented a framework that characterise QoS properties of replica services and established its mathematical model by introducing quantification methods. In order to deal with the IQS constraints and to perceive grid users’ QoS preferences accurately, they proposed a QoS preference acquisition algorithm based on analytic hierarchy process (AHP) and designed a novel effective and efficient PGA based on MapReduce paradigm for optimising the objective function which corresponds to the optimal replica. Traditional parallel EA has achieved fruitful results, however, the MPI framework lack flexibility so that it is difficult to deal with some problems including heterogeneous, load balancing and fault tolerant. When the problem space and the data size processed by the parallel EA increases, the cost of process communication and message data volume will rise sharply, then the shortcomings of MPI frame will become more apparent, which directly affect the performance of algorithm. At present, the research on parallel EA based on MapReduce framework mostly focus on how to make these classic parallel EAs including PSO, difference evolution et al parallel based on MapReduce. These research works on how to make full use of parallel mechanism of MapReduce framework, comprehensively mine inherent parallelism of EA, synthetically consider the population size, the data size processed and other factors, and make MR-PEA model suitable for solving different problems are still lack. This situation influence and restrict the development of the MRPEA itself to a certain extent.

3

MR-PEA model

Based on the parallelism of map, reduce function of MapReduce framework, weighing the advantages and disadvantages of various existing parallel evolutionary computation model, we research the parallelism of fitness evaluation, crossover, mutation and selection for large population, large datasets in order to design scalable and efficient MR-PEA model. In this paper, the MR-PEA model we proposed use a hybrid model. In the upper layer, it uses coarse-grained computational model, while in the lower layer, it uses a certain one among coarse-grained, fine-grained and master-slave model. MR-PEA model we proposed is shown in Figure 1. In the following, we will give the MR-PEA algorithm based on our MR-PEA model.

High performance parallel evolutionary algorithm model based on MapReduce framework Figure 1

MR-PEA model (see online version for colours)

3.1 For large population •

•

293

The dataset processed by algorithm, the initial population, some parameters including migration rate and migration will be storage into the distributed file system (DFS). Initial population is divided into n subpopulations, each sub population executes Map function and calculate the fitness of population in parallel in order to improve the efficiency that the fitness values are written into local shared memory (such as shared memory or shared memory of cluster).

•

Deciding whether needing migration according to the migration strategy. If it needed, then evaluating and selecting emigration individual and the individual replaced by immigrated individual, else the selection operation will be done.

•

Crossover based on a certain strategy.

•

Mutate based on a certain strategy.

•

Repeat the above steps until termination criteria are met. Then the results are written into DFS.

The above four operations can be realised by reduce function. The lower computation model can be confirmed by configuring the task number of parallel execution by reduce function. For example, if the task number always is 1, then the master-slave model will be used. If the task number is greater than 1 and far less than the child

population size, then the coarse-grained model will be used. Otherwise, the fine-grained model will be used.

3.2 For large datasets The partitioned part is dataset rather than the population. Other parts are similar as the situation of large population.

4

Experiment

In order to justify the effectiveness of the MR-PEA model we proposed, our experiments involve both a serial and a parallel implementation of GEP. A symbolic regression example was conducted as the testing datasets. We used speedup as a measure of scalability. Speedup is defined as the ratio of the serial runtime of the best sequential algorithm for solving a problem to the time taken by the parallel algorithm to solve the same problem on p processing elements. Thus, the speedup with p processors is shown by variable Sp. Sp is given by the following formula (1)

Sp =

T1 Tp

(1)

where T1 means average computation time of sequential algorithm, Tp means average computation time of parallel algorithm based on MR-PEA model.

X. Du et al.

294

The definition of speedup is ambiguous as to what constitutes the best sequential algorithm. Since GEP based on MR-PEA model (MR-GEP) is a reformulation of GEP that performs the same operations, we use our standard single-processor GEP implementation as the best sequential algorithm. This implementation and the MR-GEP implementation are written in the same language, share common code, and run on the same hardware.

4.1 Experimental setting We have implemented MR-GEP on the .NET platform using the C# language. All experiments were conducted on a cluster, and each node has a single Pentium 3 processor. We conducted an experiment to evaluate the performance of our framework. Experimental data comes from the amount of gas emitted from coalfaces of literature (Xin et al., 2010). For every experiment, we all run 50 times independently and then compute average computation time and Sp. In order to compare parallel algorithm and sequential algorithm, we first run sequential algorithm for 50 times. The parameter setting is shown in Table 1. Table 1

Parameter setting

Population size

800

Head length of genes

1

Body length of genes

10

Head length of homeotic gene

6

Body length of homeotic gene

20

Numbers of genes

0.3

Two-point recombination rate

0.3

Gene recombination rate

0.1

IS transposition rate

0.1

RIS transposition rate

0.1

Gene transposition rate

0.1

Speedup for MR-GEP (see online version for colours) Linear Speedup

20 15 Speedup

Concluding remarks and future work

In this paper, a novel MR-PEA model is proposed by comprehensively mining inherent parallelism of EA, synthetically considering the population size, the data size processed and other factors. And the MR-PEA model we proposed is suitable for solving different problems. In order to justify the effectiveness of the MR-PEA model we proposed, a parallel GEP based on MapReduce (MR-GEP) used to solve symbolic regression is given.

Acknowledgements The work published in this paper was funded by Fujian Provincial Natural Science Fund under Grant (nos. 2011J05146, 2012J01248, 2012J01250) the project of preeminent youth fund of Fujian Province (Fujian teaching department [2011] 29), Science Research Foundation of Fujian Provincial Department of Education under Grant (nos. JB11029, JB11028), the project of Jiangxi Education Department under Grant no. GJJ12307, the technology support project of Jiangxi Province under Grant no. 20112BBE50026 and outstanding young teacher training fund of Fujian Normal University (no. fjsdjk2012083).

References

0.044

One-point recombination rate

10 5 0 4

5

8

Mutation rate

Figure 2

When solve same problem, MRGEP can spend less time than sequential GEP and they all run at computer with same configuration. And the speedup is shown in Figure 2.

8

16 24 Number of processors

32

64

4.2 Test of speedup Separately records computation time that sequential and parallel algorithm arrived assigned generation, then computes speedup.

Dean, J. and Ghemawat, S. (2008) ‘MapReduce: simplified data processing on large clusters’, Communications of the ACM, January, Vol. 51, No. 1, pp.107–113. Dean, J. and Ghemawat, S. (2010), ‘MapReduce: a flexible data processing tool’, Communications of the ACM, Vol. 53, No. 1, pp.72–77. Dong, W.Y. and Li, Y.X. (2002) ‘The application and parallel realization of evolutionary optimization of simulation’, Chinese Journal of Computers, Vol. 25, No. 11, pp.1236–1242. Du, X., Ding, L.X., Xie, C.W. and Chen, L. (2010) ‘Parallel gene expression programming based on EDA’, Computer Science, Vol. 37, No. 2, pp.196–199. Dzafic, I., Neisius, H.T. and Mohapatra, P. (2012) ‘High performance power flow algorithm for symmetrical distribution networks with unbalanced loading’, International Journal of Computer Applications in Technology, Vol. 43, No. 2, pp.179–187. Jin, C., Vecchiola, C. and Buyya, R. (2008) ‘MRPGA: an extension of MapReduce for parallelizing genetic algorithms’, in IEEE Fourth International Conference on eScience, pp.214–221. Kaariainen, J. and Valimaki, A. (2011) ‘Get a grip on your distributed software development with application lifecycle management’, International Journal of Computer Applications in Technology, Vol. 40, No. 3, pp.181–190.

High performance parallel evolutionary algorithm model based on MapReduce framework Kang, L.S., Liu, P. and Chen, Y.P. (2001) ‘Asynchronous parallel evolutionary algorithm for function optimization’, Journal of Computer Research and Development, Vol. 38, No. 11, pp.1381–1386. Lammel, R. (2008) ‘Google’s mapreduce programming model-revisited’, Science of Computer Programming, Vol. 70, No. 1, pp.1–30. Li, J. and Peng, J. (2011) ‘Task scheduling algorithm based on improved genetic algorithm in cloud computing environment’, Journal of Computer Applications, Vol. 31, No. 1, pp.184–186. Mcnabb, A.W., Monson, C.K. and Seppi, K.D. (2007) ‘Parallel PSO using mapreduce’, in IEEE Congress on Evolutionary Computation, pp.7–14. Sarkar, B.K., Sana, S.S. and Chaudhuri, K. (2011) ‘Selecting informative rules with parallel genetic algorithm in classification problem’, Applied Mathematics and Computation, Vol. 218, No. 7, pp.3247–3264. Sun, Y.F., Zhang, C.K., Gao, J.G. and Deng, F.Q. (2007) ‘For constrained non-linear programming: chaotic parallel genetic algorithm with feedback’, Chinese Journal of Computers, Vol. 30, No. 3, pp.424–430. Tagawa, K. and Ishimizu, T. (2010) ‘Concurrent differential evolution based on MapReduce’, International Journal of Computers, Vol. 4, No. 4, pp.161–168. Tantar, A.A., Melab, N., Talbi, E.G. et al. (2007) ‘A parallel hybrid genetic algorithm for protein structure prediction on the computational grid’, Future Generation Computer Systems, Vol. 23, No. 3, pp.398–409. Tsai, C.C., Huang, H.C. and Chan, C.K. (2011) ‘Parallel elite genetic algorithm and its application to global path planning for autonomous robot navigation’, IEEE Transactions on Industrial Electronics, Vol. 58, No. 10, pp.4813–4821.

295

Verma, A., Llora, X., Goldberg, D.E. et al. (2009) ‘Scaling genetic algorithms using mapreduce’, in Ninth International Conference on Intelligent Systems Design and Applications (ISDA’09), pp.13–18. Verma, A., Llora, X., Venkataraman, S. et al. (2010) ‘Scaling eCGA model building via data-intensive computing’, in 2010 IEEE Congress on Evolutionary Computation (CEC 2010), pp.1–8. Villegas, F.J. (2007) ‘Parallel genetic-algorithm optimization of shaped beam coverage areas using planar 2-D phased arrays’, IEEE Transactions on Antennas and Propagation, Vol. 55, No. 6, pp.1745–1753. Xin, D., Ding, L.X. and Jia, L.Y. (2010) ‘Asynchronous distributed parallel gene expression programming based on estimation of distribution algorithm’, Fourth International Conference on Natural Computation, pp.433–437. Xiong, R., Luo, J., Song, A. et al. (2011) ‘QoS preference-aware replica selection strategy using MapReduce-based PGA in data grids’, in 2011 International Conference on Parallel Processing (ICPP 2011), pp.394–403. Xu, Y.Z. and Zeng, W.H. (2005) ‘The development of parallel evolutionary algorithms’, Pattern Recognition and Artificial Intelligence, May, Vol. 18, No. 2, pp.183–192. Yuan, Y., Yuan, J.L., Du, H.F. and Li, L. (2012) ‘An improved multi-objective ant colony algorithm for building life cycle energy consumption optimisation’, International Journal of Computer Applications in Technology, Vol. 43, No. 1, pp.60–66 Yussof, S., Razali, R.A., See, O.H. et al. (2009) ‘A coarse-grained parallel genetic algorithm with migration for shortest path routing problem’, in HPCC‘09, pp.615–621. Zhou, C. (2010) ‘Fast parallelization of differential evolution algorithm using MapReduce’, in Proceedings of the 12th

Annual Conference on Genetic and Evolutionary Computation, pp.1113–1114.

High performance parallel evolutionary algorithm ...

High performance parallel evolutionary algorithm ...

Suggest Documents

HIGH PERFORMANCE PARALLEL ALGORITHMS ...

Performance Analysis of Parallel Pollard's Rho Algorithm

A Parallel Evolutionary Algorithm for Prioritized ... - Semantic Scholar

a parallel implementation of a multi-objective evolutionary algorithm

A New Generic Parallel Evolutionary Algorithm for ... - Semantic Scholar

A parallel micro evolutionary algorithm for taxi sharing optimization

Parallel Hybrid Evolutionary Algorithm in a Grid ... - CiteSeerX

A parallel hybrid evolutionary algorithm for the optimization of broker ...

Efficient parallel evolutionary optimization algorithm applied to a water ...

Parallel Object Contracts for High Performance ...

High-Performance Incremental Scheduling on Massively Parallel

High Performance Parallel Java with JavaParty

High-performance parallel implicit CFD - FUN3d - NASA

Making Parallel Packet Switches Practical - High Performance ...

High Performance Computing - Parallel Programming Laboratory

Analysis of High Performance Parallel Computing ...

Parallel computing techniques for high-performance ... - Phidu

High Performance Grid Environment for Parallel ...

Memory-Efficient and High-Performance Parallel

Parallel Distributed Computing using Python - High-Performance ...

High performance parallel computing for Computational Fluid ...

Designing a High Performance Parallel Personal Cluster

High-Performance Incremental Scheduling on Massively Parallel ...

High Performance Compilers for Parallel Computing ...