International Conference on Emerging Trends in Electrical, Electronics and Communication Technologies-ICECIT, 2012
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop Laxmi Thakare, Dr. Jayant Umale, Bhushan Thakare* Assistant Professor, STES’s NBN Sinhgad School of Engineering,Pune,411041,India Professor,Pimpri Chinchwad College of Engineering,Pune,411044,India Assistant Professsor, STES’sSinhgad Academy of Engineering,Pune,411048,India
Abstract Use of heuristic methods is common to find the solutions to the optimization problems for scientific and real time. Problems such as Travelling Salesman (TSP) require more accurate solution which is tried by various optimization methods. Research in this direction shows the use of Genetic algorithms (GA) as promising candidate and is preferred over other optimization methods. Firstly due to the use of large population and secondly large number of iterations GA tends to be more accurate but inefficient with respect to computation time. In this report variant of GA for an optimization problem called travelling tournament problem (TTP) is formulated and experimented so as to take care of execution time using Hadoop MapReduce
Keywords: Travelling Tournament Problem; Optimization; Genetic algorithm; MapReduce.
1. Introduction [16] Optimization is the process of making something better. Basically it is the math tool that we rely on to get answers to problems. In mathematics and computer science, an optimization problem is the problem of finding the best solution from all feasible solutions. The terminology “best” solution implies that there is more than one solution and the solutions are not of equal value. These optimization problems in scientific and real life domain finds their solutions in algorithmic research which shows the use of classical methods, heuristic methods and nature based methods to compute the solution. In solving hard problems where there lacks efficient exact solutions, heuristic approaches, including genetic algorithms (GAs), simulated annealing (SA), and tabu search (TS) have become popular alternatives. GAs can be easily parallelized to scale its computing ability because of its intrinsic parallelism, and hence offer great potential toward solving hard problems. The inherent parallel nature of evolutionary algorithms makes them optimal candidates for parallelization. We implement GAs on Hadoop, which is increasingly becoming the de-facto standard MapReduce implementation. Here MapReduce is a programming model that enables the users to easily develop large-scale
* Corresponding author. Tel.: 919970081289, 919322940253, 919665613951;. E-mail address:
[email protected],
[email protected],
[email protected] © 2012. Published by Elsevier Ltd.
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
distributed applications and Hadoop is an open source implementation of the MapReduce model. So in this paper we present an algorithm for solving an optimization problem called as Travelling Tournament Problem using Genetic algorithm on Hadoop.
1.1. Travelling Tournament Problem Travelling Tournament Problem is a sports timetabling problem that abstracts the important issues in creating timetables where team travel is an important issue. Professional sports leagues exist all over the world. Popular leagues are often of huge economic importance due to the enormous revenues generated by selling tickets and broadcasting rights for the games. Hence, the planning of these leagues is of major importance. An important aspect is the generation of a timetable for the tournaments that specifies the order in which the teams play each other during the season and the venue of each game. Given the number of teams and the pair-wise distances between their home venues, TTP asks for a timetable of a double round robin tournament that minimizes the sum of the distances travelled by the teams during the season. Formally, the travelling tournament problem (TTP) can be presented as: A set T of teams, where |T | = n ≥ 4 is even. An (n × n)-distance matrix D = (dij) specifies the distances between the home venues of the teams, i.e., dij ≥0 is the distance between the home venues of teams i and j. The distances are assumed to be symmetric (i.e., dij=dji for all i, j) and satisfy dii=0 for all i as well as the triangle inequality (i.e.,dij +djk ≥dik for all i, j, k). A game is an ordered pair of teams, where the first team is the home team and the second the away team. A sequence of consecutive away games of a team is called a road trip, and a sequence of consecutive home games is called a home stand. A double round robin tournament is a collection of games in which every team plays every other team once at home and once away (i.e., at the other team’s home venue). Hence, exactly 2n − 2 time slots are necessary for a double round robin tournament. Before the tournament, each team is assumed to stay at its home venue and it has to return there after the tournament in case that its last game is an away game. Between two consecutive away games, a team travels directly from the venue of the first opponent to the venue of the second opponent. Problem Formulation: • N (even) teams take part in a tournament • Each team has its own stadium at its home city • Distances between the stadiums are known Constraints: • Double Round-robin Constraint: Each team, A and B, say, play exactly twice: once at A’s home site (B@A) and once at B’s home site (A@B). Thus there are 2(n-1) rounds and in each round n/2 games are played. • Consecutive Constraint: No team must have more than 3 consecutive home or 3 consecutive away matches • No repeater constraint: For any team, A and B, say A@B cannot follow B@A immediately in the next round 1.2. Genetic Algorithms Genetic algorithms are stochastic search methods introduced by J Holland in the 1970’s and inspired by the biological evolution of living beings. So these algorithms belongs to the larger class of evolutionary algorithms (EA), which provides solutions to various optimization problems using natural evolution techniques, such as selection, crossover and mutation. In a genetic algorithm, a population of strings called as chromosomes, which encodes the candidate solutions called as individuals or phenotypes to an optimization problem, evolves towards a better solution. In nature, "survival of the fittest" will make the adapted species to be selected and produce subsequent generations. It is expected that the average fitness or adaptation of each generation improves over that of the current generation, because each generation inherits good features from the current generation. Conceptually, this means that we will achieve a population composed of identical individuals. However, mutation occurs and introduces diversification in the population and prevents this homogenous situation. Analogously, GA solves optimization problems by simulating nature's generation paradigm. A typical GA starts
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
with an initial population consisting of random solutions or individual chromosomes. The fitness of each chromosome is evaluated by an appropriate measure, such as an objective function value for an optimization problem. These chromosomes mate based on their fitness in the population. This mating produces an offspring (new solution) inheriting features from each parent's chromosome. The bits within this chromosome mutate based on a small probability to ensure diversity. These offspring replace older chromosomes in the population that are not as fit as the offspring. This procedure is carried out either a predefined of generations or until the desired Algorithm 1: Algorithm forfor Generic Genetic number Algorithm initial population, G(0); solutions is not reached. AnGenerate abstraction of a typical GA is given as below in the following algorithm by the Fig. 1. Evaluate G(0); (apply fitness function) t=1; Repeat Generate G(t) using G(t-1); (apply operators) Evaluate (decode (G(t))); t=t+1; Until solution is found or termination. Fig. 1. Algorithm for Generic Genetic Algorithm
1.3. Hadoop MapReduce Hadoop is an open source framework for writing and running distributed applications that process large amounts of data. Hadoop is a fault-tolerant distributed system for data storage which is highly scalable. It provides base components on top of which new distributed computing sub projects can be implemented. The scalability is the result of a Self-Healing High Bandwith Clustered Storage, known as HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce. MapReduce [9] is a programming model that enables the users to easily develop large-scale distributed applications. In this model, the computation inputs a set of key/value pairs, and produces a set of output key/value pairs. The user of the MapReduce library expresses the computation as two functions: Map and Reduce. The Hadoop system is the most popular open-source implementation of the MapReduce framework. It is based on the principle that moving computation to the place of data is cheaper than moving large data blocks to the place of computation. MapReduce divides applications into many small blocks of work that can be executed in parallel. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluste and then the MapReduce can then process the data where it is located.
2. Related Work As seen from literature, GA was developed by John Holland (1975) over the course of the 1960s and 1970s and finally popularized by one of his student, David Goldberg, who was able to solve a difficult problem involving the control of gas-pipeline transmission for his dissertation (Goldberg, 1989). Holland’s original work was summarized in his book and he was the first to try to develop a theoretical basis for GAs through his schema theorem. Goldberg has probably contributed the most fuel to the GA fire with his successful applications and excellent book (1989). Since then, many versions of evolutionary programming have been tried with varying degrees of success. A brief literature survey of Genetic algorithm, its application to problems and some of its parallel implementation details can be found in [12]. The travelling tournament problem (TTP) was introduced by Easton et al. [6]. It is a challenging combinatorial optimization problem in sports scheduling that abstracts the most important aspects in creating timetables where team travel is an important issue. The Travelling tournament problem, is an stimulating problem not just for its modeling of issues of interest to real sports league but, First the problem combines issues of feasibility (with regards to issues of home/away patterns) and optimality (with regards to the distance travelled). Second, even small instances seem to be difficult. Also if we observe carefully, the TTP is a variant of the well-known Travelling Salesman Problem (TSP), asking for a distance optimal schedule linking venues that are close to one another. Also the computational complexity of the TSP is NP-hard [13]. The initial approaches to this optimization problem, was mainly centered on constraint and integer programmer
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
approaches. Variations of models using this approach can be found in [4], [7], [8], [10], and [14]. These approaches creates models with variables for each opponent pairing, home or away venue and usually other variables that represent the constraint being considered. It was seen that solutions to the TTP instances was often found after weeks of computation, even on high performance machines using parallel computing; the first solution to NL6 required over fifteen minutes of computation time on twenty parallel machines [4]. As the number of teams grows slowly, these variables grow exponentially, which significantly reduces the effectiveness with teams > 12. Because the search space of these problems grows exponentially, research has focussed on using heuristic, rather than exact, methods to find near optimal solutions, in reasonable computation times. So in past the following three approaches have been greatly used with great success: tabu search[2], simulated annealing[1][11] and genetic algorithms[3]. Tabu search and simulated annealing are both iterative search procedures and local search algorithms, where solutions are examined once at a time: the next solution is generated by making some changes to current solution in some small way. By applying certain rules at each step, these methods guide the search toward a solution which best satisfies some criteria, i.e. a solution with a sufficiently low (or high) objective function value. Genetic Algorithms follows a similar rule for sticking to what is known to be good when searching for searching improved solutions. However there are some differences between this approach and previous methods. Here, the notation is that GA rests on the premise that a good solution contains several good subparts [5]. Therefore genetic algorithm builds on what is already know to be a good by taking parts of two solutions and merging them to create new solution. 3. Proposed Approach 3.1. Genetic Algorithm as a Method to Solve TTP •
•
Selection of parameters Using a genetic algorithm to obtain an optimal solution for the TTP requires that first we identify all the relevant parameters and their respective encoding in terms of chromosomes, genes, and populations. So for TTP the parameters are: number of teams and possible number of matches between teams. Parameter Representation With reference to the TTP, a chromosome is nothing more than an ordered list of match in which each possible match is represented only once, and their order determines the total distance travelled by each team. With chromosomes being a collection of genes, a gene for the TTP is an object representing a match (match number and Team x and Team y). A A
B
C
D
0
1
2
4
5
B
3
C
6
7
D
9
10
8 11
Fig. 2. Parameter Representation for TTP Problem
•
•
Fitness Evaluation Function For the TTP the fitness of a chromosome is calculated by traversing the genes in order, summing the distances of each individual teams which represents the total schedule distance. The chromosome with the lowest fitness score is the fittest as it represents the schedule with minimum travelling distance. Selection The goal of selection is to choose parents that are capable of creating fitter children while maintaining genetic diversity. So in our algorithm we have applied rank based and also random selection. First Parent is chosen based on the rank while the second one is chosen randomly from the population. In rank based selection all
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
•
•
•
•
individuals are sorted according to the objective function and the best individual is given the largest proportion of mating opportunities, while the worst individual is given the least number of mating opportunities. Crossover In our algorithm 2-point crossover is used. While performing crossover constraint checking also takes place to check whether correct offspring’s are formed or not. If the offspring’s does not satisfy the constraints, they are discarded. Parent 1: [0 1 2 3| 4 5 6 7| 8 9 10 11] Parent 2:[11 10 9 8 |7 6 5 4| 3 2 1 0] Offspring 1: [0 1 2 3| 7 6 5 4 | 8 9 10 11] Offspring 2: [11 10 9 8 |4 5 6 7| 3 2 1 0] Mutation Mutation is similar to crossover, except that only a single chromosome is considered and changes are made to that single individual. Here in TTP we consider mutation as swapping of a slot. Offspring: [0 1| 2 3| 4 5| 6 7| 8 9| 10 11] Mutated offspring: [0 1| 8 9| 4 5| 6 7| 2 3| 10 11] If the mutated offspring satisfies the given constraint then accepted otherwise rejected. Insertion After crossover and mutation it's time to insert the children into the population and begin the selection, crossover, and mutation process again. To preserve the strongest chromosomes within the population, a concept called elitism that protects the n (n = population size/2 in our TTP example) most elite(best) chromosomes is introduced, which replaces the rest of the population with the newly created children. Convergence Here in our approach the convergence criteria is decided by either the number of generation or the count of the same schedule (count>100).
3.2. Proposed Algorithm Intractability of the scheduling problem has led to use of heuristics for obtaining sub-optimal solutions in a reasonable amount of time. Thus we propose a Genetic algorithm approach for solving a real world optimization problem like TTP. Algorithm 2: Proposed GA for TTP 1.Read the number of teams and Distance matrix. 2.Each match event is given a unique number and that represent the chromosome structure and length. i.e. for N=4, chromosome length=12 [(2N-2)*N/2] with integer coding. 3.Generate initial population 3.1 Using the first chromosome generate more feasible (solutions) chromosomes, by applying fitness function. 3.2 Calculate Cost of each chromosome. 4.Sort Chromosome in ascending order and save the first one as best solution of the generation. 5.Repeat 5.1Rank based selection of Parents from population. 5.2Perform crossover and mutation 5.3Add offspring’s to current population 5.4Cost calculation and ascending chromosome sorting. 5.5Save the first chromosome as the best solution of the generation and if same sequence increment count. Until (no. of generations or count 100, then corresponding chromosome displayed as best solution or if termination is by reaching no. of generations then display the first stored chromosome as best solution to the problem. Fig. 3. Proposed algorithm for Solution to TTP using GA.
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
3.3. Solution Representation In this paper, a schedule is represented by a table indicating the opponents of the teams. Each line corresponds to a round and each column corresponds to a team. The opponent of team Ti, at round rk is determined by a symbol @ element (i,k). If (i,k) has a symbol @ then game takes place at Ti’s opponents home otherwise at Ti’s home. Consider for example the schedule S for 6 teams (and thus 10 rounds). Schedule S specifies that team T1 has the following schedule. It successfully plays against teams T6 at home, T2 away, T4 at home, T3 at home, T5 away, T4 away, T3 away, T5 at home, T2 at home and T6 away. Thus the travel cost of team T1 is: T12+T21+T15+T54+T43+T31+T16+T61. R\T
1
2
3
4
5
6
1
6
5
@4
3
@2
@1
2
@2
1
5
6
@3
@4
3
4
@3
2
@1
6
@5
4
3
@6
@1
@5
4
2
5
@5
4
6
@2
1
@3
6
@4
3
@2
1
@6
5
7
@3
6
1
5
@4
@2
8
5
@4
@6
2
@1
3
9
2
@1
@5
@6
3
4
10
@6 @5 4 @3 2 1 Fig.4.Solution Representation for Proposed Algorithm
3.4. Experimental Results For the purpose of performance analysis of GA, we proposed to implement the proposed algorithm in 4 ways: sequentially, Parallelly (multithread), multinode (2 nodes) and finally on Hadoop using MapReduce programming. So we designed a variant of GA as PGA by working on given population. The population is divided and a GA operates on the sets of population obtained from population division. So a number of instances of GA will be running concurrently. We have considered 3 instances of NL dataset [15] N=4, N=6 and N=8 for both java and MapReduce(MR) implementation and the results are taken for varying two parameters; one is the population size and the other number of iterations. By keeping the number of iterations constant as 1000 the population size is varied from 1000 to 8000 and respective results were noted down for all the 3 instances. Similarly by keeping the population size constant as 2000 the number of iterations is varied from 300 to 2000 and respective results were noted down for all the 3 instances. The results obtained are summarized in the following table (See Table 1). Chromosome Length (CL), Total Time (TT) for the completion of the schedule and the Total cost of travelling (TCT) is given in the table. Table 1. Computational Results
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
Instance
CL
TT
TCT
Observation
Java
MR
Feasible Range[15]
Actual Result(java)
Actual Result(MR)
NL-4
12
00:17
0:15
8276
8276
8276
Optimal
NL-6
30
04:54
03:37
22969-23916
24349
23861
Optimal
NL-8
56
04:39:47
01:46:49
39721-41505
40972
41285
Optimal
Table 2. Comparative result table for TTP w.r.t Population size for N=6 Population Size
Execution Time (Seq)
Execution Time (multithread)
Execution Time (multinode)
Execution Time (MR)
1000
01:16
00:57
02:07
00:53
2000
02:22
01:40
03:16
01:34
3000
03:38
02:34
06:18
01:58
4000
04:42
03:17
06:47
02:25
5000
05:55
04:11
11:07
03:05
6000
07:09
04:54
13:37
03:37
7000
08:22
05:43
12:22
04:15
8000
09:32
06:32
14:00
05:04
Fig 5: Comparative graph for TTP w.r.t Population size for N=6
Table 3. Comparative result table for TTP w.r.t No. of iterations for N=6 No. of Iterations
Execution Time (seq)
Execution Time (multithread)
Execution Time (multinode)
Execution Time (MR)
300
01:19
01:05
01:32
00:58
600
01:47
01:18
02:08
01:04
900
02:10
02:00
02:47
02:04
1200
02:40
02:17
03:32
02:15
1500
03:06
02:28
03:59
02:34
1800
03:38
03:13
04:18
02:43
2000
03:54
03:35
04:42
03:15
Fig 6: Comparative graph for TTP w.r.t No. of iterations for N=6
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop
3.4.1 Observations It was observed from the tables and graphs that execution of TTP model for all the instances increases with respect to increase in population size and no. of iterations. Also from NL6 and NL8 instances the possibility of getting an optimum solution also increases as the population size and no. of iterations increases. Also it can be observed that the execution time for MapReduce is less as compared to sequential and multinode one but comparable to the multithreading execution. As observed larger population tends not only to find a better solution, but also converges with fewer generations 4. Conclusion and Future Work Genetic algorithms appear to find good solutions for the travelling tournament problem, however it depends very much on the way the problem is encoded and which crossover and mutation methods are used. Use of a genetic algorithm is by no means an exact science and is largely predicated on trial and error in determining an appropriate encoding of the problem being optimized as well as the selection of the parameters controlling the genetic algorithm. Overall, it seems that genetic algorithms have proved suitable for solving the travelling tournament problem. It seems that the biggest problem with the genetic algorithm devised for the travelling tournament problem is that it is difficult to maintain structure from the parent chromosomes and still end up with a legal tour in the child chromosomes. We have shown that GA performance can be increased with working on variant of GA.Results have shown that execution of GA by MapReduce system proves to be beneficial in terms of execution time and as future work the following can be carried out. • Implementing (MoHPBGA) multiobjective population balanced genetic algorithm using Hadoop MapReduce[12] for TTP.
Acknowledgements Heartfelt gratitude to Ms. Madhuri Aher, Ms. Swati Dhopte, Mr.Bhushan Bhokse, Ankush Thakare.
References
1.
Anagnostopoulos, A., Michel, L., Van Hentenryck, P., Vergados, Y. (2003) A simulated annealing approach to the traveling tournament problem, Proceedings CPAIOR'03, Montreal. 2. Cardemil. Optimizacion de fixtures deportivos: Estado del arte y un algoritmo tabu search para el traveling tournament problem. Master's thesis, Universidad de Buenos Aires, Departamento de Computacion, Buenos Aires, 2002. 3. Dr. Nitin S. Choubey, “ A Novel Encoding Scheme for Traveling Tournament Problem using Genetic Algorithm”, IJCA Special Issue on “Evolutionary Computation for Optimization Techniques” ECOT, 2010. 4. Easton, K.; Nemhauser, G.; and Trick, M. 2002. Solving the travelling tournament problem: A combined integer programming and constraint programming approach. Proceedings of the 4th International Conference on the Practice and Theory of Automated Timetabling 319–330. 5. Emanual Falkenaure “Applying GA to real world problems” springer 1999. 6. Kelly Easton, Nemhauser George L., and Trick Michael A. The Travelling Tournament Problem Description and Benchmarks.In: Proceedings of the 7th International Conference on Principles and Practice of Constraint Programming (CP). Volume 2239 of LNCS. (2001) 580–584 7. Henz, M. (1999) Constraint-based round robin tournament planning, in: D. De Schreye (ed.), Proceedings of the International Conference on Logic Programming, Las Cruces, New Mexico, MIT Press, 545-557. 8. Henz, M. (2004) Playing with constraint programming and large neighborhood search for traveling tournaments, Proceedings PATAT 2004, Pittsburgh, USA. 9. J. Dean and S. Ghemawat. “MapReduce: Simplified data processing on large clusters”, Commun. ACM, 51(1):107–113, 2008. 10. Leong, G. (2003). Constraint programming for the traveling tournament problem
Solution to Traveling Tournament Problem using Genetic Algorithm on Hadoop www.comp.nus.edu.sg/henz/students/gan_tiaw_leong.pdf 11. Lim, A., Zhang, X. (2003) Integer programming and simulated annealing for scheduling sports competition on multiple venues, Proceedings MIC 2003. 12. Poka Laxmi, Jayant Umale, Sunita Mahajan, “MoHPBGA: Multi-objective Hierarchical Population Balanced Genetic Algorithm using MapReduce”, International Journal of Computer Applications (0975 – 8887) Volume 40– No.2, February 2012. 13. Thielen, C., and Westphal, S. 2011. Complexity of the traveling tournament problem. Theoretical Computer Science 412:345–351. 14. Trick, M.A. (2003) Integer and constraint programming approaches for round-robin tournament scheduling, in: E. Burke and P. De Causmaecker (eds.), PATAT 2002, Lecture Notes in Computer Science 2740, Springer,63-77. 15. Trick M. Challenge Traveling Tournament Instances, August 23, 2010, from http://mat.gsia.cmu.edu/TOURN/ 16. http://en.wikipedia.org/wiki/Optimization_problem.