Adapting MapReduce Framework for Genetic Algorithm ... - IEEE Xplore

2013 IEEE Conference on Systems, Process & Control (ICSPC2013), 13 - 15 December 2013, Kuala Lumpur, Malaysia

Adapting MapReduce Framework for Genetic Algorithm with Large Population Noor Elaiza Abd Khalid, Ahmad Firdaus Ahmad Fadzil, Mazani Manaf Faculty of Computer and Mathematical Science MARA University of Technology (UiTM) Shah Alam, Malaysia elaiza@tmsk.uitm.edu.my, ahmadfirdausfadzil@gmail.com, mazani@tmsk.uitm.edu.my

Abstract— Genetic algorithm (GA) is an algorithm that models inspiration from natural evolution to solve complex problems. GA is renowned for its ability to optimize different types of problem. However, the performance of GA necessitates data and process intensive computing when incorporating large population. This research proposes and evaluates the performance of GA by adapting MapReduce (MR), a parallel processing framework introduced by Google that utilize commodity hardware. The algorithm is executed with population size of up to 10 million. Performance scalability is tested by using 1, 2, 3, and 4 node configurations. The travelling salesman problem (TSP) is chosen as the case study while performance improvement, speedup, and efficiency are employed for performance benchmarking. This research revealed that MR can be naturally adapted for GA. It is also discovered that MR can accommodate GA with large population while providing good performance and scalability.

performance computing [5], bioinformatics [6; 7], data mining [8] and others. Previous research works has applied MR-based parallel GA [24; 25; 26]. However, not much work has been done in straightforward benchmark problem such as travelling salesman problem (TSP). TSP problem has been used to assess different types of optimization algorithms [20; 21; 22; 23]. This research proposes to adapt GA task and process intensive of large population in MR parallel processing framework. The dynamicity of TSP makes it a suitable problem to explore the performance of TSP via MR-GA. Preliminary test is done on MR cluster consisting of four processing nodes [17]. The proposed algorithm is evaluated with performance improvement index, speedup, and efficiency [18].

Index Terms— MapReduce, Genetic Algorithm, Population, Travelling Salesman Problem

A. Genetic Algorithm and Population GA relates the Darwinian principle of evolution to automate problem solving [1]. The basis of GA involves the process of reproduction, random variation, competition, and selection of contending individuals in a population [9]. Example applications of GA include machine learning [11] and various NP-Hard/NP-Complete optimization problem [12; 13; 14]. The flow of GA begins by randomly creating the initial population. The process is then preceded with the selection process which considers each and every individual in the population. The selected individuals are usually consists of individual with high fitness value. The favored individuals are then carried over for recombination process to produce new offspring. The algorithm will stop either if the problem is solved or the population reaches a convergence where most individual carries the same fitness value [1; 15]. Population of individuals functions as one of the crucial aspect in determining the accuracy of GA [2]. Small population size often leads to genetic drift which causes loss of genetic diversity [9]. Large population in GA could also create a greater chance that the initial state of the population contains the optimal solution [2]. However, heavy computing workload is required in order to accommodate GA with large population. Executing large population GA via serial execution will result

II. RELATED WORKS

I. INTRODUCTION Inspired from natural evolution model, genetic algorithm (GA) has been ubiquitously used to optimize different array of problems [1]. GA is an algorithm based on individuals in a population called chromosomes providing a basket of candidate solutions. Large population size expands the solution search space increasing diversity and accuracy of the algorithm [2]. However, executing large population often requires data as well as process intensive computing and it can be a tedious and slow process. In view of this requirement, attempts have been made to parallelize GA using various frameworks and techniques [19]. MapReduce (MR) parallel processing framework is an alternative parallel processing frameworks. Created by Google, it is used to address increasing data and process [3]. MR framework utilizes a collection of commodity hardware that can be parallelized by using the “Map and Reduce” functional abstraction. This abstraction hides the complex parallelization process from the user while offering an automated parallel processing facility [3; 4]. Consequently, more computing workload can be attained by using MR compare to serial execution. MR framework is applicable in most real world application and has been deployed in areas such as high-

978-1-4799-2209-3/13/$31.00 ©2013 IEEE

36


in prolonged execution time due to inadequate computing resources.

III. METHODOLOGY A. Hardware and Software Configuration This research utilized Hadoop Beowulf cluster that runs on four (4) off-the-shelf PC nodes with analogous configuration [17]. The nodes are connected via a hub/router. Table II illustrates the hardware specification;

B. Parallel Genetic Algorithm The intensive computing requirement of GA has led to various techniques of parallelization to improve GA performance [19]. Through parallelization, the performance of GA can be improved by employing more than one computing resource to process individuals in GA’s population. Techniques of parallelizing GA include master-slave, finegrained, and multiple populations/deme (island). Master-slave parallel GA employs the usual form of parallelization whereas the master distributes the tasks to different processing nodes. Fine-grained parallel GA otherwise is a technique where a spatially-structured population that allows frequent interaction between different individuals. The last technique, the island parallelization incorporates multiple populations that are distributed across different computing resources. This technique is most efficient when employed in distributed computing such as Beowulf clusters [27]. Its infrequent interaction via coarse-grained parallelization reduces the communication overhead which is an eminent drawback in distributed computing. In this research, the MR parallel algorithm is executed based on this technique.

TABLE II. HARDWARE SPECIFICATION Hardware Specification

10

3,628,800

11

39,916,800

12

479,001,600

13

6,227,020,800

14

87,178,291,200

15

1,307,674,368,000

RAM

3.46 GB DDR-2 RAM 500 GB HDD

Ethernet Adapter

100/1000 Mbps Gigabit

Ethernet Cable

Cat-6 cable

Every CPU in this configuration is equipped with dual core specification that allows the execution of more than one computing task at a time [16]. The Gigabit Ethernet adapters and cables used for this research allows faster data transfer than the conventional 10/100 Mbps Fast Ethernet. A router with Gigabit Ethernet capabilities is also preferred in order to allow reliable and faster data transfer between PC nodes. In terms of software, Hadoop MR Framework is installed to enable MapReduce programming. It is installed on top of Ubuntu 10 Operating System that is complemented with OpenSSH Secure Shell (SSH) server and OpenJDK JAVA JDK 6. Hadoop MapReduce is configured with 2 map tasks for each PC node. Table III describes the configuration of Hadoop MR; TABLE III. HADOOP MR CONFIGURATION

TABLE I. TSP AND NUMBER OF CANDIDATE SOLUTIONS Number of Possible Solutions

Intel® Core™ Duo E7500 @ 2.93GHz

Storage

C. The Travelling Salesman Problem This research implements TSP as the case study to test the algorithm. The main idea behind of this algorithm is finding the shortest path visiting a number of cities, with each can only be visited one in the entire tour. Despite the simplistic description, the amount of candidate solutions is too large for any computing resource to process (candidate solutions = n!; n = number of cities). Table I illustrates the amount of candidate solutions for different number of cities;

Number of Cities

CPU

Hadoop MapReduce Configuration No. of Map Tasks per Node

2

No. of Reduce Tasks per Node

1

Number of Master Node

1

Number of Slave Node

4 (including Master)

The configuration is as follows; two map tasks as the CPU used is dual core, thus allowing parallelism in core level by running 2 map tasks concurrently. Number of slaves node are set to four to get the best out of the Beowulf cluster. One node will act as both master and slave to achieve this purpose. The number of reduce tasks per node is set to one as each nodes are equipped with a single hardware storage. Increasing the number of reduce tasks per node will create I/O overhead and subsequently effect the performance of MR.

As depicted above, larger number of cities results in a greater number of candidates. Therefore, no polynomial time in finding the optimal solution can be determined via brute-force approach. TSP therefore has been considered as an optimization problem of NP-Hard/NP-Complete category [10]. Evolutionary algorithm such as genetic algorithm is one of the preferred initiatives to optimize problems similar to that of the TSP algorithms [20; 21; 22; 23]

37


tours/chromosomes are encoded into a form of single line text. Each line will be decoded and processed during the map phase. MR Framework also utilized (key, value) pair as the building block to link data between the map and reduce phase. Therefore, the “key” is represented by the fitness of a tour while the “value” contains the sequence of path visited during the tour. The “value” also contains essential data for GA such as crossover partner information. The native implementation of Hadoop MR Framework utilizes sort, shuffle, and group for the (key, value) pairs. This is favorable for this algorithm as it requires ranking selection to preserve elitism. Therefore, every tours/chromosomes can be automatically ranked between reduce and map phase. Fig. 2 depicts the algorithm flow for TSP via MR-GA (TSPvMR-GA);

B. Serial Algorithm Design The GA employed in this algorithm follows the fundamental flow of GA described in the earlier section. Fig. 1 illustrates the algorithm flow in order to optimize TSP via GA;

Fig.1. Serial TSP via GA (TSPvGA) Algorithm Flow

The chromosomes in the algorithm represent the tours that are initialized randomly during the early phase of the algorithm. Each tour is then evaluated to find the total distance for each tour. This phase represents the fitness function of the GA. The tours are then sorted from shortest to longest to rank and preserve elitism for each tours. The process will continue to the next phase until the maximum numbers of generation are met or the algorithm converged. In the next phase, upper half of the total tours will enter the crossover phase to produce offspring, replacing the lower half of the tours. There are also a fixed number of percentages that a tour can mutate to produce more variations to the tours/chromosomes.

Fig.2. TSP via MR-GA (TSPvMR-GA) Algorithm Flow

The execution of the above algorithm is based on island parallelization technique discussed in the earlier section. The number of map tasks is spawned according to the number of processors in a computing resource. Each computing resources will then produce a new set of different populations/ demes to be processed as the next generation. During the map phase, the process of crossover/mutation and fitness evaluation is handled while selection and repopulation is done during reduce and sorting/ grouping/

C. MapReduce Parallel Algorithm Design In order to natively adapt the GA into native MapReduce framework, a number of parameters have to be considered. The Hadoop MR Frameworks necessitates the use of input file to automatically distribute computing tasks. For this reason, every

38


The test is executed using 50 cities with different number of populations of 1k, 10k, 100k, 1m, 5m, and 10m. The execution time is comparable when executing GA with population of 1k, 10k, and 100k. The time obtained is likely the overhead time for Hadoop MR Framework to initialize. For these numbers of population, the use of serial execution (1 map/ 1 reduce) is already sufficient. This however diverges when the number of population reached 1m. The usage of 1 node (2 map/ 1 reduce) trumps over the serial execution significantly. Further increasing the number of nodes for 1m population returns an improvement for the execution albeit with an insignificant value. The execution time for 2 nodes (4 map/ 2 reduce), 3 nodes (6 maps/ 3 reduce), and 4 nodes (8 map/ 4 reduce) displays a significant difference when the population count reaches 5m and 10m.

shuffling phase. The algorithm will continue to run using the multiple input/output file containing populations of tours/chromosomes until the algorithm converged or a specified number of maximum generation is reached. D. Performance Evaluation Method The performance of the proposed algorithm is measured with respect to speedup, performance improvement and efficiency with reference to the time taken for both sequential and parallel processing [17]. Speedup measures how much a parallel algorithm is faster than a corresponding sequential algorithm. The speedup calculation is based on Equation 1; (1) The performance improvement index depicts measurements relative improvement that the parallel system has over the sequential process. This performance is measured based on Equation 2; .

A. Speedup Speedup is calculated based on equation (1) listed in the earlier section. Fig. 4 illustrates the speedup obtained by TSPvMR-GA algorithm when the number of nodes used is increased;

(2)

.

Efficiency is used to estimate how well-utilized the processors are in solving the problem, compared to how much effort is wasted in communication and synchronization. The efficiency calculation is based on Equation 3; (3) IV. PERFORMANCE EVALUATION The performance evaluation is discussed with respect to the proposed performance evaluation method. The raw result comprising the execution time per generation taken of the algorithm is depicted in Fig. 3;

Fig.4. TSPvMR-GA Speedup

The value of speedup obtained by using the algorithm only shows an improvement when executed using more than 1m population. As mentioned earlier, the execution time for population of 1k, 10k, and 100k is basically the overhead time for the Hadoop MR Framework to initialize. The speedup starts to change when the population count enters 1m. Execution of using 1 node returns a healthy value of 1.632, a significant improvement over the serial execution. Despite this however, speedup of using more than 1 node shows only a trivial improvement. This suggests that 1m population only requires 1 node execution for optimal result. When executing using 5m population however, the increased number of nodes further improves the speedup. The max value of 2.753, 4.044, and 5.506 proposes that the algorithm scales well when more processing power is made available. Despite

Fig.3. TSPvMR-GA Execution Time

39


that, execution of the algorithm when using 10m population shows decline in the speedup value compared to 5m execution. This suggests that in order to further improve the speedup value, more processing power is needed. B. Performance Improvement Index Equation (2) in earlier section explicated the calculation of performance improvement index employed in this research. Fig. 5 shows the performance improvement index for TSPvMR-GA by using different number of nodes; Fig.6. TSPvMR-GA Efficiency

Theoretically, the efficiency value should be higher when less processing resources are being used. This is due to the fact that it is easier to handle fewer resources to handle the processing distribution. This result is apparent when executing the algorithm using 1k, 10k, 100k, and 1m population. Execution of using 1 node returns the highest value of efficiency due to the fact that 1 node execution utilize only one storage device that can diminish the data transfer overhead. However, the efficiency of executing 5m population shows different notion. Execution of using 4 nodes trumps over 3 nodes and 2 nodes execution despite having more overhead in data transfer. This suggests that MR Framework can efficiently handle its resources when executing in optimal configuration. The efficiency of executing 10m population however returns reduced number of efficiency as it requires higher magnitude of processing resources. This can be achieved by further increasing the number of nodes.

Fig.5. TSPvMR-GA Performance Improvement Index

The information in Fig. 5 indicates that the algorithm significantly improves when the number of population is more than 1m. This endorsed that the performance of the algorithm is improved when the serial execution requires more processing resources. The performance improvement index from executing 1m to 5m execution shows a significant improvement. The max number of performance improvement index recorded at 0.818, a strong improvement over the serial execution. This iterates that execution using the MR cluster benefits from more processing resources. Execution of using 10m population however requires more processing nodes in order to further improve the performance.

V. CONCLUSION AND RECOMMENDATION This research proposes a native MR Framework implementation for GA with large population. The performance of the proposed algorithm suggests that MR is a very efficient parallel framework that allows performance improvement over serial GA. The algorithm proposed allows performance improvement of up to 5m population which is a feat for any computing resources. The performance pattern also suggest that more population can be accommodated when more processing resources is made available. However, the iterative nature of GA requires MR Framework to initialize repeatedly for each generation which creates an overhead in terms of execution time. The fact that Hadoop MR Framework requires input data also creates an I/O overhead for the proposed algorithm. The use of iterative MR Framework such as Haloop [28] may allow an even better overall performance.

C. Efficiency The efficiency factor differs from speedup and performance improvement index due to the fact that it takes into account the number of processors used. Fig. 6 illustrates the efficiency figure obtained from equation (3);

40


REFERENCES

[16] Chai, L., Gao, Q., & Panda, D. K. (2007, May). Understanding the impact of multi-core architecture in cluster computing: A case study with intel dual-core system. In Cluster Computing and the Grid, 2007. CCGRID 2007. Seventh IEEE International Symposium on (pp. 471-478). IEEE. [17] Fadzil, A. F. A., Khalid, N. E. A., & Manaf, M. Performance of Scalable Off-The-Shelf Hardware for Data-intensive Parallel Processing using MapReduce. Computing and Convergence Technology (ICCCT), 2012 7th International Conference on (pp. 379-384). [18] Rosli, M.H.; Khalid, N.E.A.; Abidin, S.Z.Z.; Manaf, M., Heuristic Testing on Task Partitioning for Heterogeneous Cluster, Computing and Convergence Technology (ICCCT), 2012 7th International Conference on (pp. 385-390). [19] Cantú-Paz, E. (1998). A survey of parallel genetic algorithms. Calculateurs paralleles, reseaux et systems repartis, 10(2), 141171. [20] Jog, P., Suh, J. Y., & Gucht, D. V. (1989, June). The Effects of Population Size Heuristic Crossover and Local Improvement on a Genetic Algorithm for the Traveling Salesman Problem. In Proceedings of the 3rd International Conference on Genetic Algorithms (pp. 110-115). Morgan Kaufmann Publishers Inc. [21] Grefenstette, J. J., Gopal, R., Rosmaita, B. J., & Gucht, D. V. (1985, July). Genetic algorithms for the traveling salesman problem. In Proceedings of the 1st international conference on genetic algorithms (pp. 160-168). L. Erlbaum Associates Inc. [22] Birattari, M., Stützle, T., Paquete, L., & Varrentrapp, K. (2002, July). A Racing Algorithm for Configuring Metaheuristics. In GECCO (Vol. 2, pp. 11-18). [23] Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: A cooperative learning approach to the traveling salesman problem. Evolutionary Computation, IEEE Transactions on, 1(1), 53-66. [24] Jin, C., Vecchiola, C., & Buyya, R. (2008, December). MRPGA: An extension of MapReduce for parallelizing genetic algorithms. In eScience, 2008. eScience'08. IEEE Fourth International Conference on (pp. 214-221). IEEE. [25] Di Geronimo, L., Ferrucci, F., Murolo, A., & Sarro, F. (2012, April). A Parallel Genetic Algorithm Based on Hadoop MapReduce for the Automatic Generation of JUnit Test Suites. In Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference on (pp. 785-793). IEEE. [26] Huang, D. W., & Lin, J. (2010, November). Scaling populations of a genetic algorithm for job shop scheduling problems using MapReduce. In Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on (pp. 780-785). IEEE. [27] Bennett III, F. H., Koza, J. R., Shipman, J., & Stiffelman, O. (1999, July). Building a Parallel Computer System for $18, 000 that Performs a Half Peta-Flop per Day. In GECCO (pp. 14841490). [28] Bu, Y., Howe, B., Balazinska, M., & Ernst, M. D. (2010). HaLoop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 3(1-2), 285-296.

[1] Eiben, A. E., & Smith, J. E. (2003). Introduction. Introduction to evolutionary computing (pp. 1-14). New York: Springer. [2] Gotshall, S., & Rylander, B. (2002). Optimal population size and the genetic algorithm. Population, 100(400), 900. [3] Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113. [4] Dean, J., & Ghemawat, S. (2010). MapReduce: a flexible data processing tool. Communications of the ACM, 53(1), 72-77. [5] Fadika, Z., Dede, E., Govindaraju, M., & Ramakrishnan, L. (2011, September). Mariane: Mapreduce implementation adapted for hpc environments. In Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on (pp. 82-89). IEEE. [6] Matsunaga, A., Tsugawa, M., & Fortes, J. (2008, December). Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications. In eScience, 2008. eScience'08. IEEE Fourth International Conference on (pp. 222-229). IEEE. [7] Qiu, X., Ekanayake, J., Beason, S., Gunarathne, T., Fox, G., Barga, R., & Gannon, D. (2009, November). Cloud technologies for bioinformatics applications. In Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers (p. 6). ACM. [8] Papadimitriou, S., & Sun, J. (2008, December). Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on (pp. 512-521). IEEE. [9] De Jong, K., Fogel, D., & Schwefel, H. P. (1997). Handbook of Evolutionary Computation. IOP Publishing Ltd and Oxford University Press. [10] Papadimitriou, C. H. (1977). The Euclidean travelling salesman problem is NP-complete. Theoretical Computer Science, 4(3), 237-244. [11] Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning. Machine learning, 3(2), 95-99. [12] Gupta, S., Agarwal, G., & Kumar, V. (2010, February). Task scheduling in multiprocessor system using genetic algorithm. In Machine Learning and Computing (ICMLC), 2010 Second International Conference on (pp. 267-271). IEEE. [13] Hou, E. S., Ansari, N., & Ren, H. (1994). A genetic algorithm for multiprocessor scheduling. Parallel and Distributed Systems, IEEE Transactions on, 5(2), 113-120. [14] Grefenstette, J. J., Gopal, R., Rosmaita, B. J., & Gucht, D. V. (1985, July). Genetic algorithms for the traveling salesman problem. In Proceedings of the 1st international conference on genetic algorithms (pp. 160-168). L. Erlbaum Associates Inc. [15] Blickle, T., & Thiele, L. (1995). A comparison of selection schemes used in genetic algorithms.

41