A Hybrid Evolutionary Heuristic for Job Scheduling ... - Semantic Scholar

7 downloads 39895 Views 2MB Size Report
ted by different applications, job scheduling on computational grids is a large scale optimization .... As mentioned in the previous section, central to grid-based applications is the planning of jobs ..... call this representation, vector of assignments task-machine. Note that in this ...... 1177–1183, Las Vegas, USA, June 2005. 27.
A Hybrid Evolutionary Heuristic for Job Scheduling on Computational Grids? Fatos Xhafa Departament de Llenguatges i Sistemes Inform` atics, Universitat Polit`ecnica de Catalunya, Campus Nord - Ed. Omega, C/Jordi Girona Salgado 1-3, 08034 Barcelona, Spain. [email protected]

Abstract In this chapter we present a hybrid evolutionary meta-heuristic based on memetic algorithms (MAs) and several local search algorithms. The memetic algorithm is used as the principal heuristic that guides the search and could use any of 16 local search algorithms during the search process. The local search algorithms used in combination with the MA are obtained by fixing either the type of the neighborhood or the type of the move; they include swap/move based search, hill climbing, variable neighborhood search and tabu search. The proposed hybrid meta-heuristic is implemented in C++ using a generic approach based on a skeleton for memetic algorithms. The implementation has been extensively tested in order to identify a set of appropriate values for the MA and local search parameters. We have comparatively studied the combination of MA with different local search algorithms in order to identify the best hybridization. Results are compared with the best known results for the problem in the evolutionary computing literature, namely the benchmark of Braun et al. 2001, which is known to be the most difficult benchmark for static instances of the problem. Our experimental study shows that the MA + TS hybridization outperforms the combinations of MA with other local search algorithms considered in this work and also improves the results of Braun et al. for all considered instances. We also discuss some issues related to the fine tuning and experimenting of meta-heuristics in a dynamic environment.

1 Introduction In this chapter we present a new hybrid evolutionary meta-heuristic for the problem of job scheduling on computational grids. The problem of job schedul?

Research supported by ASCE Project TIN2005-09198-C02-02 and Project FP62004-IST-FETPI (AEOLUS).

2

Fatos Xhafa

ing on computational grids consists in efficiently assigning user/application jobs to grid resources. This is a multi-objective optimization problems, the two most important objectives being the minimization of the makespan and the flowtime of the system. This problem is much more complex than its version on traditional distributed systems (e.g. LAN environments) due to its dynamic nature and the high degree of heterogeneity of resources. Moreover, due to the large number of resources and the large number of jobs submitted by different applications, job scheduling on computational grids is a large scale optimization problem. Since the introduction of computational grids by Foster et al. [13, 14, 12], this problem is increasingly receiving attention of researchers due to the use of grid infrastructures in solving complex problems from many fields of interest such as optimization, simulation, etc. Moreover, the continuous construction of grid infrastructures is making possible the development of large scale applications that use the large amount of computing resources offered by computational grids. However, scheduling jobs submitted by users and/or applications to the grid resources is a main issue since it is crucial in achieving the high performance. Indeed, due to the dynamic nature of the computational grids, high heterogeneity of resources and jobs of different characteristics, in order to benefit from the computing power of the grid, an efficient assignment of jobs to resources is necessary. Any grid scheduler must produce an assignment in a very short time and must be robust to adapt itself to the changes of the grid. In the current status, however, using grid infrastructures is very complex due to the lack of efficient and robust grid resource schedulers. From a computational complexity perspective, job scheduling on computational grids is computationally hard. Therefore the use of heuristics is the de facto approach in order to cope in practice with its difficulty and because in grids it is critical to generate schedules at a minimal amount of time due to its dynamic nature. Thus, the evolutionary computing research community has already started to examine this problem, e.g. Abraham et al. [1], Buyya et al. [8], Martino and Mililotti [21] and Zomaya and Teh [36]. Memetic Algorithms and hybrid heuristics have not been yet proposed for the problem, to the best of our knowledge. In this work we present a hybrid evolutionary meta-heuristic based on memetic algorithms (MAs) [22] and several local search algorithms. The memetic algorithm is used as the principal heuristic that guides the search and could use any of 16 local search algorithms during the search process. This set of local search algorithms, used in combination with the MA, is obtained by fixing either the type of the neighborhood or the type of the move and can be grouped as follows: • swap/move–based search • hill climbing, variable neighborhood search • tabu search The proposed hybrid meta-heuristic is implemented in C++ using a generic approach based on a skeleton for memetic algorithms [5]. The im-

An Evolutionary Heuristic for Job Scheduling on Computational Grids

3

plementation has been extensively tested, on the one hand, to identify a set of appropriate values for the parameters that conduct the search and, on the other, to compare the results with the best known results for the problem in the evolutionary computing literature. To this end we have used the benchmark of Braun et al. [6], which is known to be the most difficult benchmark for static instances of the problem; it consists of instances that try to capture the high degree of heterogeneity of grid resources and workload of tasks. Our experimental study shows that the results obtained by the MA + TS hybridization outperforms the results of a GA by Braun et al. [6] and those of a GA by Carretero & Xhafa [9] for all considered instances. Despite the very good performance of the scheduler on static instances, our ultimate goal is the grid scheduler be efficient and effective in a realistic grid environment, thus, the experimenting should be done in such environments. To this end we have developed a prototype of grid simulator that extends the HyperSim open source package [27] that enables the experimental study of the performance of meta-heuristics for job scheduling on computational grids. We have used the simulator as a tool for testing our implementation by plugging the MA implementations into the simulator. It should be mentioned that experimenting in a dynamic environment is raising issues not addressed so far in the meta-heuristic research community. The remainder of the chapter is organized as follows. We give in Section 2 an introduction to computational grids. The problem of job scheduling is described and formally introduced in Section 3. We present in Section 4 some related work where different heuristics have been addressed for the problem. Memetic Algorithms and their particularization for the problem is given in Section 5. The local search algorithms used for hybridization are given in Section 6. Next, we give some implementation issues in Section 7 and an extensive experimental study in Section 8. We discuss in Section 9 some issues related to experimenting of the presented heuristics in a dynamic setting and summarize in Section 10 most important aspects of this work and envisage directions for further work.

2 Computational Grids The present state of the computation systems is, in some aspects, analogous to that of the electricity systems at the beginning of the 20th century. At that time, the generation of electricity was possible, but still it was necessary to have available generators of electricity. The true revolution that permitted its establishment was the discovery of new technologies, namely the networks of distribution and broadcast of electricity. These discoveries made possible to provide a reliable, low price service and thus the electricity became universally accessible. By analogy, the term grid is adopted for a computational grid to designate a computational infrastructure of distributed resources, highly heterogeneous

4

Fatos Xhafa

(as regards their computing power and their architecture), interconnected by heterogeneous communication networks and by a middleware that offer reliable, simple, transparent, efficient and global access as well as of low price to their potential of computation. One of the first questions raised by this emerging technology is its utility or the need of disposing computational grids. On the one hand, the computational proposals have usually shown to have a great success in any field of the human activity. Guided by the increment of the complexity of the real life problems, and prompted by the increment of the capacity of the technology, the human activity (whether scientific, engineering, business, personal, etc.) is highly based on computation. Computers are very often used to model and to simulate complex problems, for diagnoses, plant control, weather forecast, and many other fields of interest. Even so, there exist many problems that challenge or exceed our ability to solve them, typically because they require processing a large quantity of operations or data. In spite of the fact that the capacity of the computers continues improving, the computational resources do not respond to continuous demand for more computational power. For instance, a personal computer of the year 2005 is as “powerful” as a supercomputer of the year 1995, but at that time biologists (for example) were pleased with being able to compute only a molecular structure while today they investigate complex interactions among macro-molecules requiring much more computational capacity. On the other hand, statistical data show that computers are usually infra-utilized. Most of computers from academic, commercial, administration, etc. are most of the time idle or are used for basic tasks that do not require the whole computation power. It is pointed out however by several statistic studies that a considerable amount of money is spent for the acquisition of these resources. One of the main objectives of the grid technology is, therefore, to benefit from the existence of many computation resources through the sharing. As pointed out by Foster & Kesselman “the sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources...” Moreover, the sharing of the resources in a computational grid could require the application of economic policies and also of local policies of the owners of the resources. Another important function of computational grids is to provide load balancing of the resources. Indeed, an organization or company could occasionally have unexpected peaks of activity, which require a larger capacity of computation resources. If their applications were grid-enabled, they would be able to be emigrated to machines with low utilization during those peaks. The potential of a massive capacity of parallel computation is one of the most attractive characteristics of the computational grids. Aside from the purely scientific needs, the computational power is causing changes in important industries such as biomedical one, oil exploration, digital animation, aviation, in financial field, and many others. The common characteristic in these uses is that the applications are written to be able to be partitioned into almost

An Evolutionary Heuristic for Job Scheduling on Computational Grids

5

independent parts. For instance, an application of intensive use of CPUs can be thought as an application composed by sub-tasks, each one capable to be executed in a different machine of the computational grid. Although many types of resources can be shared and used in a computational grid, normally they are accessed through an application running in the grid. Normally, an application is used to define the piece of work of higher level in the Grid. A typical grid scenario is as follows: an application can generate several sub-tasks in order to be solved; the grid system is responsible for sending each sub-task to a resource to be completed. In a simpler grid scenario, it is the user who selects the most adequate machine to execute its program/sub-tasks. However, in general, grid systems must dispose of schedulers that automatically and efficiently find the most appropriate machines to execute an assembly of tasks. In a simple computational grid, such as united devices, the politics of “scanvenging” is applied. This means, each time a machine remains idle, it reports its state to the grid node responsible for the management and planning of the resources. Then, this node usually assigns to the idle machine the next pending task that can be executed in that machine. Scanvenging normally hinders the owner of the application, since in the event that the idle machine changes its state to be busy with tasks not coming from the grid system, the application is suspended or delayed. This situation would create completion times not predictable for grid-based applications. With the objective of having a predictable behavior, the resources participating in the grid often are dedicated resources (exclusive use in the grid), and they do not suffer from preemptions caused by external works. Moreover, this permits to the tools associated to the schedulers (generally known as profilers) to compute the approximate completion time for an assembly of tasks, when their characteristics are known in advance. For the majority of grid systems, scheduling is a very important mechanism. In the simplest of cases, scheduling of jobs can be done in a blind way by simply assigning the tasks to the compatible resources according to their availability or round-robin policy. Nevertheless, it is a lot more profitable to use more advanced and sophisticated schedulers. Beside, the schedulers generally react to the dynamics of the grid system, typically by evaluating the present load of the resources, and notifying when new resources join or drop from the system. Additionally, schedulers can be organized in a hierarchical form or can be distributed in order to deal with the large scale of the grid. For example, a scheduler can send a task to a cluster or to a scheduler of lower level instead of sending it to a concrete resource. Further, job scheduling problem in grid environments is gaining importance especially due to the large scale applications based on grid infrastructures, e.g. optimization (e.g. Casanova et al. [10], Goux et al. [17], Wright [33], Wright et al. [19]), Collaborative/eScience Computing (e.g. Newman et al. [24], Paniagua et al. [26] and many applications arising from concrete types of eScience Grids such as Science Grids, Access Grids, Knowledge

6

Fatos Xhafa

Grids), Data-Intensive Computing (e.g. Beynon al. [4] and many applications arising from concrete types of Data Grids such as Data Grids, Bio Grids, Grid Physics Network ) that need efficient schedulers. We give in Table 1 some examples of computational grids and applications summarized from the literature. These applications are classified in systems that directly integrate grid technology (Integrated Grid Systems), projects related to the development of middleware for distributed infrastructures (Core Middleware), projects that provide middleware for the development of gridbased applications (User Level Middleware), which on their turn distinguish the ones that are oriented toward planning of resources from those that provide programming environments, and, finally, projects for the development of real applications that use grid technology (Applications and Driven Efforts), among which distinguishes the great number of projects related to the experimental sciences, which is actually being one of the most benefited fields of the grid computing paradigm.

3 Job Scheduling on Computational Grids As mentioned in the previous section, central to grid-based applications is the planning of jobs to grid resources. The scheduling problem in distributed systems is not new at all; as a matter of fact it is one of the most studied problems in the optimization research community. However, in the grid setting there are several characteristics that make the problem different from its traditional version of conventional distributed systems. Some of these characteristics are the following: • • • • •

the dynamic structure of the computational grid highly heterogeneous resources and jobs the existence of local schedulers in different organizations or resources the existence of local policies on resources restrictions of the jobs to be scheduled (restrictions on the resources needed to complete the job, transmission cost, etc.)

Moreover, the scheduling problem in grid systems is large-scale; indeed, the number of jobs to be scheduled and the number of resources to be managed could be very large and thus we have to deal with a large-scale optimization problem. It is important to emphasize that finding an efficient scheduling is a key issue in computational grids in order to ensure a good use of the grid resources, and thus to increase its productivity. Basically, in a grid system, the scheduler assigns tasks to machines and establishes the order of execution of the jobs in any machine. The scheduling process is carried out in a dynamic form, that is, it is carried out while tasks enter in the system and the resources may vary their availability. Likewise, it is carried out in running time in order to take advantage of the properties as well as the dynamics of the system that are not

An Evolutionary Heuristic for Job Scheduling on Computational Grids

7

Table 1. Projects and applications based on grid technology Category

Project SmartNet

Organization DARPA/ITO

Purpose System for the management and planning of heterogenous distributed resources. MetaNEOS Argonne N.L. Distributed computation environment for combinatorial optimization. NetSolve Tennessee U. Programming & runtime system to access transparently libraries and high performance resources. Ninf AIST, Japan Functionalities similar to NetSolve. ST-ORM CEPBA, UPC, Scheduler for distributed batch systems. Barcelona Integrated MOL Paderborn U. Scheduler for distributed batch systems. Grid Sys- Albatross Vrije U. Object Oriented Programming for grid systems. tems PUNCH Purdue U. Computational and service environment for applications. Javelin UCSB Java-based programming & runtime system. Xtremweb Paris-Sud U. Web-based global computational environment. WebSubmit NIST Management of remote applications and resources. MILAN Arizona and NY Transparent management of end-to-end services of network resources. DISCWorld U. of Adelaide Distributed environment for information processing. Unicore Germany Java-based environment for accessing remote supercomputers. Cosm Mithral Toolkit for development of P2P applications. Globus ANL and ISI Environment for uniform and secure access to remote computational and storage resources. Core Mid- JXTA Sun Microsys- Java-based infrastructure & framework for P2P computadleware tems tions. Legion U. of Virginia Operating System on Grid for transparent access to distributed resources. P2P Accel- Intel Infrastructure for development of P2P applications based erator on .NET. AppLeS UCSD Specific scheduler for applications. User Level Condor-G U. of Wisconsin System for job management in large-scale systems. Middleware: schedulers Nimrod-G Monash U. Resource broker based on economic models MPICH-G N. Illinois U. Implementation of MPI for Globus User Level MetaMPICH RWTH, Aachen MPI programming & runtime system. Middleware: ProgramCactus Max Planck In- Framework for parallel applications. Based on MPICH-G ming envistitute and Globus. ronment GrADS Rice U. Development toolkits for grid-based applications. GridPort SDSC Development toolkits for computational sites. Grid Super- CEPBA Programming model for parallel applications on grids. scalar Data Grid CERN High Energy Physics, natural phenomena, biology. GriPhyN UCF and ANL High Energy Physics. Applications NEESGrid NCSA Earth Engineering. and driven Geodise Southampton Aero-spatial design optimization. efforts Fusion Grid Princeton/ANL Magnetic fusion. IPG NASA Aero-spatial. Earth Sys- LLNL, ANL Climate modelling. tem Grid NCAR Virtual In- UCSD Neuro-science. struments AIST: Institute of Advanced Industrial Science and Technology UPC: Universitat Polit` ecnica de Catalunya CEPBA: Centro Europeo de Paralelismo de Barcelona UCSB: University of California, Santa Barbara ANL: Argonne National Laboratory NIST: National Institute of Standards and Technology ISI: University of Southern California, Information Sciences Institute UCSD: University of California, San Diego RWTHW: RWTH Aachen University SDSC: San Diego Supercomputer Center CERN: European Organization for Nuclear Research UCF: University of Central Florida NCSA: National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign LLNL: Lawrence Livermore National Laboratory, University of California NCAR: National Center for Atmospheric Research, USA

8

Fatos Xhafa

known beforehand. The dynamic schedulers are therefore more useful for real distributed systems than static schedulers. This type of scheduling, however, imposes critical restrictions in the temporary efficiency and therefore, in its performance. Needless to say, the job scheduling in computational grids is computationally hard; it has been shown that the problem of finding optimum schedulings in heterogeneous systems is in general NP-hard [15]. We will consider here a version of the problem, which doesn’t take into account restrictions on task interdependencies, data transmission and economic aspects but yet it is of applicability in many grid-based applications. We consider thus the following scenario: the tasks being submitted to the grid are independent and are not preemptive, that is, they cannot change the resource they has been assigned to once their execution is started, unless the resource is dropped from the grid. Examples of this scenario in real life grid applications arise typically when independent users send their tasks to the grid, or in case of applications that can be split into independent tasks. Such applications are frequently encountered in scientific and academic environments. They also appear in intensive computing applications and data intensive computing, data mining and massive processing of data, etc. In such a scheduling scenario, our objective is the study and development of powerful hybrid heuristics that generate dynamically and efficiently schedulings. Problem formulation To model our problem we need an estimation or prediction of the computational load of each task, the computing capacity of each resource, and an estimation of the prior load of each one of the resources. This is the ETC– Expected Time to Compute matrix model (see e.g. Braun et al. [6]). Thus we make the usual assumption that we dispose the computing capacity of each resource, an estimation or prediction of the computational needs (workload ) of each task, and the load of prior work of each resource. Having the computing capacity of the resources and the workload of the tasks, an Expected Time to Compute matrix ETC can be built, where each position ET C[t][m] indicates the expected time to compute task t in resource m. The entries ET C[t][m] could be computed by dividing the workload of task t by the computing capacity of resource m, or in more complex ways by including the associated migration cost of task t to resource m, etc. This formulation is feasible, since it is easy to know the computing capacity of each resource. On the other hand, the requirements about computation need of the tasks can be known from specifications provided by the user, from historic data or from predictions by means of specific tools. At a first glance, by the definition of ETC matrix, one could think that this model is able to describe only consistent environments in which any tasks can be executed in any machines. However, the ETC model allows to quite easily introduce possible inconsistencies in the grid system or, more in general,

An Evolutionary Heuristic for Job Scheduling on Computational Grids

9

restrictions as regards the execution of tasks on resources. This can be done by introducing the cost of such restrictions into ETC values or by means of penalties; thus one could give a value of +∞ to ET C[t][m] to indicate that task t is incompatible with resource m. Now, we can formally define an instance of the problem as follows: – A number of independent (user/application) tasks that must be scheduled. Any task has to be processed entirely in unique resource. – A number of heterogeneous machines candidates to participate in the planning. – The workload (in millions of instructions) of each task. – The computing capacity of each machine (in mips). – The time when the machine will have finished the previously assigned tasks (readym ). This parameter measures the previous workload of a machine. – The expected time to compute, ET C, a matrix of size number tasks × number machines, where a position ET C[t][m] indicates the expected execution time of task t in machine m. This matrix is either computed from the information on workload and mips or can be explicitly provided. Fitness Several optimization criteria can be considered for this problem, certainly the problem is multiobjective. The fundamental criterion is that of minimizing the makespan, that is, the time when finishes the latest task. A secondary criterion is to minimize the flowtime of the grid system, that is, minimizing the sum of finalization times of all the tasks. • minimization of makespan: minSi ∈Sched P {maxj∈Jobs Fj } and, • minimization of flowtime: minSi ∈Sched { j∈Jobs Fj } where Fj denotes the time when the task j finalizes, Sched is the set of all possible schedules and Jobs the set of all jobs to be scheduled. Note that makespan is not affected by any particular execution order of the tasks in a concrete resource, while in order to minimize flowtime of a resource, the tasks that have been assigned to should be executed in a ascending order of their expected time to compute. It should also be noted that makespan and flowtime are contradictory objectives. It is worth observing that makespan can also be expressed in terms of the completion time. Let completion be a vector of size number machines, where completion[m] indicates the time in which machine m will finalize the processing of the previous assigned tasks as well as of those already planned tasks for the machine. The value of completion[m] is calculated as follows:

completion[m] = ready times[m] +

X {j∈T asks | schedule[j]=m}

ET C[j][m]

10

Fatos Xhafa

Then, makespan can be expressed as: makespan = max{completion[i] | i ∈ M achines}. Note that makespan is an indicator of the general productivity of the grid system. Small values of makespan mean that the scheduler is providing good and efficient planning of tasks to resources. On the other hand, flowtime refers to the response time to the user petitions for task executions, therefore, minimizing the value of flowtime means reducing the average response time of the grid system. Essentially, we want to maximize the productivity (throughput) of the grid environment through an intelligent load balancing and at the same time we want to obtain plannings that offer a quality of service (QoS ) acceptable to the users. Makespan and flowtime are both indicators of the grid system; their relation is not trivial at all, in fact they are contradictory in the sense that trying to minimize one of them could not suit to the other, especially for plannings close to optimal ones. Though makespan and flowtime are the main objectives, other objectives can be defined. Thus, a third optimization criterion is to maximize the resource utilization of the grid system; resource utilization indicates the quality of a solution with respect to the utilization of resources involved in the schedule. One possible definition of this parameter is to consider the average utilization as follows: P {i∈M achines} completion[i] max avg utilization = max {S∈Sched} makespan · number machines {S∈Sched} These criteria can be integrated in several ways to reflect the priority that we desire to establish among them. One can adapt either a hierarchic or a simultaneous approach. In the former, the criteria are sorted by their importance, in a way that if a criterion ci is of smaller importance than criterion cj , the value for the criterion cj cannot be varied while optimizing according to ci . In the latter approach, an optimal planning is that in which any improvement with respect to a criterion causes a deterioration with respect to another criterion. Here we consider both approaches. In the hierarchical approach the criterion with more priority is makespan and the second criterion is f lowtime. In the simultaneous approach, makespan and flowtime are minimized simultaneously. When minimizing both values simultaneously we have to take into account that even though makespan and flowtime are measured in the same unit (seconds), the values they can take are in incomparable ranges, due to the fact that flowtime has a higher magnitude order over makespan, and its difference increases as more jobs and machines are considered. For this reason, the value of mean flowtime, f lowtime/M (M -number of machines), is used to evaluate flowtime. Additionally, both values are weighted in order to balance their importance.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

11

Fitness value is thus calculated as: f itness = λ · makespan + (1 − λ) · mean f lowtime, where λ has been a priori fixed to λ = 0.75, that is, more priority is given to makespan, as it is the most important parameter. We have used this fitness function for the global search (conducted by MA) and have used the hierarchic approach in the local search procedures combined with the MA. Moreover, such weights to makespan and flowtime were used in a GA for the problem in Carretero & Xhafa [9] that showed to outperform a GA by Braun et al. [6]. Hence we kept the same weights in order to establish the comparison with our MA implementations.

4 Related work Job Scheduling on Computational Grids is taking considerable efforts from many researchers. As a matter of fact most of major conferences on grid computing include as a topic scheduling and resource allocations in grid systems. On the one hand, there are meta-heuristics approaches, which explore the solution space and try to overcome local optimal solutions. Most of these approaches are based on a single heuristic method such as Local Search (Ritchie and Levine [29]), Genetic Algorithms (GA) (Braun et al. [6], Zomaya and Teh [36], Martino and Mililotti [21], Abraham et al. [1], Page and Naughton [25] and Carretero and Xhafa [9]), Simulated Annealing (Yarkhan and Dongarra [35], Abraham et al. [1]) or Tabu Search (Abraham et al. [1]). Hybrid meta-heuristics have been less explored for the problem. Research on hybrid meta-heuristics has shown that hybrid combination could outperform single heuristics. Abraham et al. [1] addressed the hybridization of GA, SA and TS heuristics for dynamic job scheduling on large-scale distributed systems. The authors claim that the hybridization GA+SA algorithm has a better convergence than pure GA search and GA+TS algorithm improves the efficiency of GA. In these hybridizations a heuristic capable to deal with a population of solutions, such as GA, is combined with two other heuristics, which are local search procedures, such as TS and SA, that deal with only one solution at a time. Another hybrid approach for the problem is due to Ritchie and Levine [28, 30] who combine an ACO algorithm with a TS algorithm for the problem. One interesting characteristics of hybrid approaches is that they can provide high quality solutions in a very short time as compared to single heuristics, which could require greater “start up” to reach good solutions. This is particularly interesting to explore for the job scheduling on grids because in grids it is critical to generate schedules at a minimal amount of time due to its dynamic nature. This characteristic is explored in our hybridization approach as given in next sections.

12

Fatos Xhafa

On the other hand, though not directly related to our approach, there are economy-based schedulers, which include cost of resources in the objectives of the problem (e.g. Buyya et al. [8], Abraham et al. [2] and Buyya [7]).

5 Memetic algorithm for Job Scheduling on Computational Grids Memetic algorithms (MAs) are a subset of evolutionary algorithms; MAs present several singularities that distinguish them from other evolutionary algorithms. MAs combine the concepts of evolutionary search with those of local search by taking advantage of good characteristics of both of them. In this sense MAs could be considered as hybrid evolutionary algorithms. MAs arose at the end of the decade of the eighties as an attempt to combine concepts and strategies of different meta-heuristics; they were introduced for the first time by Moscato [22]. The denomination “memetic” comes from the term meme, introduced by R. Dawkins as an analogy to the gene but in a context of cultural evolution, referring to what we could call a “fashion” or group of common characteristics in a population, not at a genetic level but at a higher level. These characteristics would be transmitted through the population by means of an imitation process, in a wide sense of the term. It is also used often the term of Lamarckian Evolution, where the local search would simulate in this case, the individual acquisition of characteristics that would be coded in the genotype to be later on inherited by the descendant. In this sense the objective of MAs is to search good solutions based on individual’s local improvement inside a selective environment; therefore, many versions of the MAs can be implemented by using different local search methods. However, as we will see later, the good behavior of the algorithm is decided by the need of establishing a good balance between the local search mechanism and the evolutionary process that conducts the global search. In other terms, the local search has an intensification function in the process of the global search in a region of the search space by efficiently improving each solution toward a good local one. An excessive protagonism of the local search could damage the global search, or an excessively sophisticated local search algorithm, could disable the work carried out by the rest of the algorithm with the consequent resentment of the execution time. We give next the description of an MA template. Then, we particularize the template for job scheduling problem and implement it by using an MA skeleton, that is a C++ version of the template, given in [5]. As regards the instantiation the skeleton for job scheduling we show how are implemented the MA operators as well the local search methods, which are combined in the MA. It’s worth mentioning that MAs could work with unstructured or structured populations; in the former there is no relation at all among the individuals of the population while in the second the individuals are in some sort

An Evolutionary Heuristic for Job Scheduling on Computational Grids

13

related among them (Cellular MA are examples of such MAs) typically by defining a neighborhood structure. We have used an unstructured MA for the purpose of this work, thus, all the individuals can be selected to be recombined, which is not the case of structured MA in which an individual is crossed with its neighbors. 5.1 Outline of MA for Job Scheduling Our starting point is a generic template for MAs. Beside the advantages of reuse, the generic approach is flexible to implement different hybridization of the MA with local search methods. The memetic algorithm template considered here is shown in Fig. 1. Local-Search-based Memetic Algorithm begin initializePopulation P op; foreach i ∈ P op do i := Local-Search-Engine(i); foreach i ∈ P op do Evaluate(i); repeat /* generations loop */ for j := 1 to #recombinations do selectToCombine a set Spar ⊆ P op; off spring := Recombine(Spar , x); off spring := Local-Search-Engine(off spring); Evaluate(off spring); addInPopulation off spring to P op; endfor; for j := 1 to #mutations do selectToMutate i ∈ P op; im := Mutate(i); im := Local-Search-Engine(im ); Evaluate(im ); addInPopulation im to P op; endfor; P op := SelectPop(P op); if P op meetsPopConvCriteria then P op := RestartPop(P op); until termination-condition=True; end;

Fig. 1. Memetic Algorithm Template

As we can observe from Fig. 1, MA conserves the common characteristics of an evolutionary algorithm, although it presents important differences that directly affect the implementation of the operators. One of the most important difference is that an MA doesn’t work with an intermediate population, as it is done in genetic and other evolutionary algorithms. Rather, all the recombinations and the mutations are done using the individuals of the global population, the resulting solutions are added to this and in the same generation they can be part of other re-combinations, or they could be mutated again. The population thus grows during each iteration and therefore the selection process, instead of choosing the individuals from the intermediate population, it selects the individuals of the next generation by reducing the

14

Fatos Xhafa

size of the resulting population to the initial population size, avoiding thus that it grew indefinitely. Also, there are differences regarding the selective pressure of the algorithm; more precisely, in an MA the inherent selective pressure of the evolutionary algorithms is “broken” into different parts of the algorithm. In GAs the selective pressure is basically done in the selection of the intermediate population and in the substitution of the old individuals by the new ones. Now, in the MAs, the memetic information, which is intended to last in many generations, is determined: (a) by the way the individuals are selected to be recombined; (b) in the way the individuals are selected to be mutated; (c) when adding a new solution to the population (fruit of a recombination or of a mutation); (d) in the selection mechanism of the individuals of the next generation; and, finally, (e) in the definition of the recombination operator. Solution representation The encoding of individuals (also known as chromosome) of the population is a key issue in evolutionary-like algorithms. Note that for a combinatorial optimization problem, individual refers to a solution of the problem. On the one hand, encodings determine the type of the operators that could be used to ensure the evolution of the individuals. One type of desirable representation is the one that respects the structure of the search space according to the properties of the problem. This representation is usually known as direct representation. For the Job Scheduling on Computational Grids a direct representation is obtained as follows. Feasible solutions are encoded in a vector, called schedule, of size N (number of tasks), where schedule[i] indicates the machine where task i is assigned by the schedule. Thus, the values of this vector are natural numbers in the range [1, M ] (M –number of machines). We call this representation, vector of assignments task-machine. Note that in this representation a machine number can appear more than once. Initialization The initialization process consists in obtaining the first population. Typically, the initial solutions are generated randomly; besides a random method, for job scheduling, other methods [20, 34, 6, 1] can be used to generate solutions, among them, the ad hoc heuristics Opportunistic Load Balancing (OLB), Minimum Completion Time (MCT), Minimum Execution Time (MET), Switching Algorithm (Switch), K-percent Best (KPB), Min-min, Max-min, Sufferage, Relative-cost and Longest Job to Fastest Resource-Shortest Job to Fastest Resource (LJFR-SJFR). Clearly, we are interested to obtain a diverse population. For this reason, the individuals will be generated via one of the ad hoc methods mentioned above and all but one individual will be randomly modified. This modification consists in the reassignment of a subset of tasks (roughly 15% of the tasks) to randomly chosen resources.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

15

Recombination The recombination consists in mixing the information contained in two or more solutions and produce a new solution to be added to the population. The different operators that have been implemented are simply distinguished by the way they select the genes of the parent individuals to form the descending individual. Clearly, no new information is added to the descendant, except the one contained in parent solutions. One-Point Recombination: Consists in a multiple version of the OnePoint crossover. For each individual participating in recombination, a cutting point splits the vector of assignments task-resource into two parts. Starting from the beginning of the vector, the elements that will be part of the new individual, are selected from the first individual till the cutting point is reached; starting the next position the cutting del vector delthen, nou individu es vanfrom seleccionant del vector del primerofindividu de forma point, we selectsuccessiva. the individuals completing the new Quan s’arribaofal the punt second de tall del individual primer individutill es considera l’individu següent per seleccionar els elements, a partir de la vector posició següent la del individual or reaching the cuttingsempre point of the and aso on.punt In order de tall del individu anterior. Aquest procés es repeteix successivament fins a emplenar to include in the new individual at least one element from each individual tot el vector del nou individu. Per tal de que es prengui com a mínim un element de in recombination, the arenodefined in creixent, non coincident cada individu els tallscutting són definitspoints en posicions coincidentsto i debe forma per positionstant and in es anpodrà increasing way (see Fig. 2 coincident for an example). només combinar un màxim d’individus amb el nombre de tasques a planificar, ja que sinó no es podrien definir talls suficients.

-

3

3

1

0

0

0

2

3

3

2

1

2

3

3

2

1

0

1

2

0

0

1

0

2

3

2

1

2

1

3

2

0

2

1

1

3

3

0

0

1

3

3

3

3

2

2

3

0

0

1

Fig. 2. An example of One-Point Recombination

Uniform Recombination: Segueix el mateix mecanisme que el Uniform Crossover però considerant dos o més individus. Per generar el nou individu es construeix aleatòriament una màscara, la qual the indica,same per cada posició del vector del nou Uniform Recombination: Follows mechanism as the Uniform individu, l’individu del qual es copiarà el valor per aquella posició (es pren la mateixa Crossover but considering two or more individuals. To generate the new inposició del vector de l’individu vell), seguint una distribució de probabilitats dividual,uniforme. a maskPeris tant, randomly built, which indicates, for each position la màscara ja no és binària sinó que cada posició pren un valor of the new individual, the individual from which will be copied the value for that entre 0 i nb_solutions_to_recombine-1.

position (the same position of the old individual is taken), following the uniform distribution. Therefore, the mask is no longer binary but rather 3 3 1 0 0 0 2 3 3 2 each0 position takes a value between 0 and nb solutions to recombine-1 mask 2 1 2 3 0 0 3 1 0 1 (see Fig.1 3 for3 an3 example). 2 1 0 2 1 0 2 1 0 0 3 1on 3the 0 fitness Fitness-based Recombination: This recombination 2 0 1 0is based value of the solutions to recombine. As in the case of the Uniform Recombi2 3 2 1 2 1 3 0 0 1 2 nation, a mask is built to indicate, for each position of the new individual, 2 1 3value 0 of0 the 1 same position will be copied, but the 1 3 0 2 whose the 3individual probability of choosing an individual for each position of the mask is proportional to the quality of individuals (see Fig. 4 for an example). Thus -

Fitness-based Recombination: Recombinació basada en el valor de fitness de les solucions a recombinar. De la mateixa manera que l’operador de Uniform Recombination, es construeix una màscara en la qual s’indica, per cada posició del vector del nou individu, l’individu del qual es copiarà el valor d’aquella mateixa posició, però la probabilitat d’elegir un individu per cada posició de la màscara es basa amb la proporció entre la qualitat d’aquell individu respecte la resta, de tal manera que es tendirà a copiar els gens dels individus més dotats. La probabilitat, doncs, de copiar el gen d’un individu i amb fitness fi per qualsevol posició del vector del nou individu és:

74

- Uniform Recombination: Segueix el mateix mecanisme que el Uniform Crossover però considerant dos o més individus. Per generar el nou individu es construeix aleatòriament una màscara, la qual indica, per cada posició del vector del nou individu, l’individu del qual es copiarà el valor per aquella posició (es pren la mateixa posició del vector de l’individu vell), seguint una distribució de probabilitats uniforme. Per tant, la màscara ja no és binària sinó que cada posició pren un valor entre 0 i nb_solutions_to_recombine-1. 16

Fatos Xhafa

0

3

3

1

0

0

0

2

3

3

2

1

1

2

3

3

2

1

0

1

2

0

2

0

1

0

2

3

2

1

2

1

3

3

2

0

2

1

1

3

3

0

0

1

mask 2

1

2

3

0

0

3

1

0

1

0

2

0

1

0

0

3

1

3

0

Fig. 3. An example of Uniform Recombination - Fitness-based Recombination: Recombinació basada en el valor de fitness de les the probability of an individual i ofde fitness solucionsofa copying recombinar. the De lagene mateixa manera que l’operador Uniform fi for any position ofRecombination, the new individual is computed as follows: es construeix una màscara en la qual s’indica, per cada posició del

vector del nou individu, l’individu del qual es fcopiarà el valor d’aquella mateixa i 1 − P −1 posició, però la probabilitat d’elegirf un individu N per cada fkposició de la màscara es basa k=0 1 − pi = f amb la proporció entre la qualitat d’aquell ∑ individu N − 1respecte la resta, de tal manera que pi = es tendirà a copiar els gens Ndels individus − 1 més dotats. La probabilitat, doncs, de copiar where N denotes the number of tasks. el gen d’un individu i amb fitness fi per qualsevol posició del vector del nou individu és: i

− 1

N

k

k

=

0

fitness=5 p=0.32

0

3

1

1

3

1

0

0

2

3

3

2

0

2

3

3

2

fitness=20 p=0.27 1

0

1

2

0

fitness=50 p=0.16

2

0

3

2

1

0

2

3

0

2

1

1

2

1

2

1

74

0

3

2

1

0

0

1

3

0

1

3

0

0

3

0

0

0

0

3

0

3

fitness=25 p=0.25

Mutation

3

3

0

0

1

Fig. 4. An example of Fitness-based Recombination Mutació: Els operadors de mutació coincideixen amb els utilitzats pel GA en el capítol anterior. Aquests són els operadors move, swap, both i rebalance-both. La mutació en el MA, però, no modifica cap de les solucions, sinó que afegeix una nova solució a la població resultant de la mutació d’una altra.

Addició: Per cada solució nova creada, ja sigui a partir d’una recombinació o d’una mutació,

The mutation randomly perturbing the enindividuals of the aquesta process és afegida a laconsists població. Esin poden seguir, però, dues polítiques diferents l’addició: populationafegir andsempre is applied with(add_only_if_better certain probability pm . Several mutation operla nova solució = fals), o afegir-la només en cas que valor de fitness millor que laare pitjorthe de les solucions continguda la població ators haveobtingui beenunconsidered. These operators move,enswap, move&swap actual (add_only_if_better = cert). that the mutation in MAs doesn’t modify the and re-balance. Note however solutions but rather it adds to the population a new solution resulting from Mecanismes de selecció: Els mecanismes de selecció implementats coincideixen amb els the mutation of another one. utilitzats pel GA (Random Selection, Best Selection, Linear Ranking Selection, Exponential Selection, Binary Tournament Selection i N Tournament Selection) explicats en el MutationRanking Move: This operator moves a task from a resource to another capítol anterior. Aquests, però, han estat adaptats tant per seleccionar el grup de solucions per one, soa lathat the machine is a assigned different. Note that it recombinació, new per seleccionar la solució mutar, així comtoper be seleccionar els individus is possible to reach any solution by applying successive moves que formaran part de la següent generació. Per tant, dependentment de com es combinin from any given solution. aquestes tres seleccions es determinarà el grau de pressió selectiva de l’algorisme. - Selecció per recombinar: Consisteix en seleccionar un cert nombre de solucions de la població (determinat per nb_solutions_to_recombine) les quals serviran per crear-ne una de nova fruit de la combinació de les anteriors. Aquesta selecció determina la informació memètica que es replicarà, tot i que combinada en la nova solució, i que es millorarà mitjançant el mecanisme de cerca local. L’operador de selecció utilitzat està determinat pels paràmetres recombine_selection i rec_selection_extra de la classe Setup. - Selecció per mutar: En cada iteració de la fase de mutació, es selecciona una solució a partir de la qual es crearà una nova solució fruit d’una mutació. La selecció de la solució a mutar determina la informació memètica que serà replicada, tot i que mutada, en la nova solució, i a la que també s’hi aplicarà el mecanisme de cerca local. L’operador de selecció utilitzat està determinat pels paràmetres mutate_selection i mut_selection_extra de la classe Setup.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

17

Mutation swap: Considering movements of tasks between machines is effective, but often it turns out more useful to make interchanges of the allocations of two tasks. Clearly, this operator should be applied to two tasks assigned to different machines. Mutation move&swap: The mutation by swap hides a problem: the number of jobs assigned to any processor remains inalterable by mutation. A combined operator avoids this problem in the following way: with a probability pm0 , a mutation move is applied, otherwise a mutation swap is applied. The value of pm0 will depend on the difference of behavior of the mutation operators swap and move. Re-balancing: The idea is to first improve somehow the solution (by rebalancing the machine loads) and then mutate it. Re-balancing is done in two steps: in the first, a machine m, from most overloaded resources is chosen at random; further, we identify two tasks, t and t0 such that task t is assigned to another machine m0 whose ET C for machine m is less than or equal to the ET C of task t0 assigned to m; tasks t and t0 are interchanged (swap). In the second step, in case re-balancing was not possible, we try to re-balance by move. After this, if the re-balancing were not possible, a mutation (move or swap each with 0.5 probability) is applied. Addition For each newly created solution, either from a recombination or from a mutation, this could be added to the population. Two different criteria can be followed for adding new solutions: either always add the new solution or only add it when the new solution has a fitness better than the worst solution in the current population. Selection mechanisms Several standard selection mechanisms have been considered: Random Selection, Best Selection, Linear Ranking Selection, Exponential Ranking Selection, Binary Tournament Selection and N-Tournament Selection. These operators, however, have been adapted in order to select a group of solutions for the recombination, to select a solution to mutate as well as to select the individuals that will form the next generation. Therefore, depending of how are combined these three selections, the degree of selective pressure of the algorithm will be determined. Selection to recombine: Consists in selecting a certain number of individuals of the population, which will be combined to create a new individual. This selection determines the memetic information that will be replicated, through recombination in the new solution, that will be improved by the local search. Selection to mutate: In each iteration of the mutation phase, a solution is selected and mutated to create a new solution. The selection of the solution

18

Fatos Xhafa

to mutate determines the memetic information that will be replicated, through mutation, in the new solution, and to which will be applied the local search. The selection of the population: Once all the recombinations and mutations in the current generation are done, the individuals that will form the next generation are selected. It’s possible that after the recombinations and the mutations, the population size could temporarily increase, since new solutions are added, therefore, the selection operator has the function of reducing the population size to a constant value determined by population size parameter. Notice that the MA template includes the possibility to launch the search from a new population when the population has lost its diversity. However, because the problem of Job Scheduling on Computational Grids requires solutions in a small amount of time (almost in real time), it has not been necessary to restart the search, since within the little search time it is preferable to intensify the search rather than starting the search from a new population.

6 Local search procedures The objective of the local search procedure in MAs is to improve each solution, before it enters in the population. When implementing this mechanism in an MA, it is necessary to keep in mind that it is not that important the quality of the local search algorithm itself, rather, it is important the quality that is obtained as a result of the cooperation between this mechanism and the global/evolutionary search. For this reason, it is necessary to put special attention to the trade-off between the two mechanisms in the search process. If the local search were predominant, by itself it would already intensify too much the improvement of each solution and it would disable the evolutionary mechanism of the global search. Several local search algorithm have been considered for this work. The basic local search algorithm adopted here is that of Hill Climbing in which the exploration is done from a solution to a neighboring solution only accepting those movements that improve the precedent solution according to the optimization criteria. The algorithm is shown in Fig. 5. procedure local search(s) while not termination-condition do s’ := generateNeighbor(s); if improvement(s,s’) then s := s’; end;

Fig. 5. Basic Local Search Algorithm

An Evolutionary Heuristic for Job Scheduling on Computational Grids

19

Note that different local search algorithms can be defined according to the definition of the neighborhood of a solution (i.e. generateNeighbor() procedure). On the other hand, the termination condition of the search is established as a maximum number of iterations to be run (fixed by a parameter nb local search iterations). However, as the MA advances in the search process, the encountered solutions will be each time better and thus it will be each time more difficult to locally improve them. Due to this phenomenon, the local search could execute many iterations without improvement, implying an useless expense of time. For this reason, we define a parameter (nb ls not improving iterations) that establishes the maximum allowed number of iterations without improvement. 6.1 Neighborhood exploration The neighborhood exploration is the key in any local search procedures. Given a solution s, the neighborhood of s is the set of solutions s0 to which we could jump in a single movement or local perturbation of s. Clearly, the neighborhood relation depends on the solution representation and the type of the local movement to be applied to. Thus, in spite of the simplicity of the algorithm, there will be so many versions of the Hill Climbing as neighborhood relationships can be defined, and the quality of its behavior is directly conditioned by the way a neighboring solution is generated starting from an initial solution. Also, there can be defined different variants according to the order and the way in which neighboring solutions are visited (see e.g. [32]). First, if in each iteration the best neighboring solution is accepted, we speak of steepest descent 2 . In this case, it is necessary to keep in mind that the iterations are more expensive since they imply the verification of improvement for each possible neighboring solution and identify the best local movement. But, if given a solution, we take as a new solution the first neighboring solution that improves, we speak of next descent. In this last case, according to the order in which are visited the neighboring solutions there can be distinguished two types of movements: deterministic movement and random movement. The deterministic movement visits the neighboring solutions in a certain order, in such a way that whenever applied to a solution, it would always carry out the same movements, and it would always end up to the same result for a given number of iterations. On the other hand, the random movement consists in visiting the neighboring solutions in a random order, in such a way that two searches starting from the same solution will very probably arrive to different results. This type of movement is usually more effective. Following this outline, sixteen search mechanisms have been implemented according to the neighborhood relationship established and/or according to the type of the movement. The implemented local search mechanisms are detailed next. 2

Recall that our optimization criterion is minimization; we would speak of steepest ascent, otherwise

20

Fatos Xhafa

Local Move (LM) This is based on the mutation operator Move. Two solutions are neighbors if they only differ in a position of their vector of assignments task-resource. Therefore, this mechanism simply moves a randomly chosen task from the resource where it was assigned to, to another randomly chosen resource, generating thus a neighboring solution. Steepest Local Move (SLM) This is similar to Local Move but now the chosen task to be moved is not assigned to a randomly chosen resource but rather it carries out the movement yielding the greatest improvement. In the generation of a neighboring solution all possible solutions are generated by moving the chosen task looking for the best move. For this reason, in this case the iterations of Hill Climbing will be very expensive, since for each generated solution it is necessary to evaluate again the makespan and flowtime, which are time consuming, but perhaps it is also more effective. Note that the task to be moved is randomly chosen. Local Swap (LS) This is based on the mutation operator Swap. A neighboring solution is generated by exchanging the resources of two randomly chosen tasks assigned to different resources. Steepest Local Swap (SLS) This consists in randomly choosing a task and apply a movement swap that yields the greatest improvement. As in the case of Steepest Local Move, it is a blind and exhaustive search where all the possible neighbor solutions are computed requiring the calculation of their makespan and flowtime, therefore the iterations will be quite expensive. Local Rebalance (LR) It is based on the mutation operator Rebalance. The movement from a solution to a neighboring solution is done by means of rebalancing of the most loaded resources. Deep Local Rebalance (DLR) This is a modification of the Local Rebalance; it tries to apply a movement with the largest improvement in rebalancing. Initially, it takes a resource r1 randomly among the most loaded resources, that is, with completion time = local makespan. Next, a task t1 assigned to another resource r2 with which the resource r1 obtains the largest drop in the value of the ETC matrix (contrary to the Local Rebalance where any resource was chosen) is chosen. In the same

An Evolutionary Heuristic for Job Scheduling on Computational Grids

21

way, we search a task t2 among the tasks assigned to r1 with the largest drop in the value of the ETC matrix for the resource r2 . In case the exchange of the tasks t2 and t1 between the resources r1 andr2 reduce the completion time of r1 , this change is accepted. Moreover, with a certain probability, it is also required that the new completion time of the resource r2 will not be greater than the local makespan of the solution. If the exchange of the tasks is not accepted, or in the remote case that all the tasks are assigned to r1 , then the Local Steepest Move is applied for some of the tasks of this same resource. Local MCT Move (LMCTM) This is based on the MCT (Minimum Completion Time) heuristic. More precisely, given a chosen task, this is moved to the resource yielding the smallest completion time among all the resources. Local MCT Swap (LMCTS) This is a variation of the previous mechanism in which, given a randomly chosen task t1 , we search to exchange it with a task t2 assigned to a different resource, where the maximum completion time of the two implied resources is the smallest of all possible exchanges. Another variant of this heuristic could be obtained by minimizing the sum of the completion time of the two implied resources. Local MFT Move (LMFTM) Up to now, the presented local search mechanisms are centered in the reduction of the completion time of the resources in order to get a reduction of the makespan of the solution. Another possibility of improvement would be the one of trying to reduce the flowtime of the solution. The mechanism of Local MFT (Minimum Flowtime) Move consists in to carry out the movement of a randomly chosen task that yields the largest reduction in the flowtime. Local MFT Swap (LMFTS) Given a randomly chosen task, we search the task among those assigned to a different resource, such that the exchange of the two tasks yields the largest reduction in the value of flowtime, that is, we search the exchange that minimizes the value of the sum of the flowtime contributed by each implied resource. Local Flowtime Rebalance (LFR) This mechanism is based on the Local Rebalance but now the most loaded resource is considered that with largest flowtime value. Similarly, we search an exchange, among one of the tasks of the most loaded resource and a task of another resource that reduces the value of the flowtime contributed by the most loaded resource. Moreover, with a certain probability, it is required that

22

Fatos Xhafa

the new solution doesn’t obtain a flowtime value larger than the current one. In case such an exchange is not possible, one of the tasks of the most loaded resource is moved to another resource so that the largest reduction in the value of flowtime is obtained. Local Short Hop (LSH) This mechanism is based on the process of Short Hop as local search in the Tabu Search algorithm implemented in Braun et al. [6]. In this process, for each possible pair of tasks, it is evaluated each pair of possible allocations of resources always adopting the new solution in the event of improvement. The version we have implemented consists in the application of this process by visiting each one of the pairs of tasks and each one of the pairs of resources, but only considering pairs of resources one from the subset of the most loaded resources and the other from the subset of the less loaded resources together with the subset of tasks that are assigned to these resources. In this way a certain balancing of loads is promoted. Once the resources are sorted according to their completion time, the subsets of resources are taken as follows: 0.1 · M of most loaded resources and the 0.75 · M of less loaded resources. Subsequently, the tasks assigned to the most loaded resources and those of less loaded resources are visited in search for possible exchanges that yield improvement (the order in which are visited the tasks and resources is random). In each iteration (hop) we evaluate the exchange of a task of a most loaded resource with a task assigned to a less loaded resource. The exchange will be accepted if the assignment reduces the value of completion time of the implied resources. We give in Fig. 6 the pseudo-code of the Short Hop procedure. Note that the pairs are visited randomly in order to avoid the repetition of the same combinations along the execution of this process. It is necessary, however, to find a good balance between the number of iterations of the Hill Climbing (nb local search iterations) and the number of hops in each iteration, keeping in mind that the more hops are performed more different exchanges are evaluated to the risk of wasting more time in calculations. Emptiest Resource Rebalance (ERR) The aim is to balance the workload of the resources but now starting from the less loaded resource. Let r1 be the resource with the smallest value of completion time; we search an exchange between one of the tasks of this resource and a task of another resource r2 that decreases the completion time of the resource r2 . Also, with a certain probability it is required that the new value of completion time of the resource r1 is not worsen. In case r1 is empty, then it is assigned any task. Emptiest Resource Flowtime Rebalance (ERR) This is similar to the previous mechanism but now the less loaded resource is considered the one that contributes the smallest flowtime. Let r1 be such

An Evolutionary Heuristic for Job Scheduling on Computational Grids

23

Function generateNeighbor(s: Solution) return Solution s neighbor := s machines1 := permutation of most overloaded resources machines2 := permutation of less overloaded resources tasks1 := permutation of tasks assigned to resources in machines1 tasks2 := permutation of tasks assigned to resources in machines2 NHops := 0 tasks1.begin() while NHops < lsearch extra parameter and not tasks1.end() do t1 := tasks1.next() machines1.begin() while NHops < lsearch extra parameter and not machines1.end() do r1 := machines1.next() tasks2.begin() while NHops < lsearch extra parameter and not tasks2.end() do t2 := tasks2.next() machines2.begin() while NHops < lsearch extra parameter and not machines2.end() do r1 := machines2.next() s neighbor’ := s neighbor Evaluate solution s neighbor’ with task t1 assigned to resource r2 and task t2 assigned to resource r1 if improvement(s neighbor, s neighbor’) then s neighbor := s neighbor’ NHops := NHops + 1 endwhile endwhile endwhile endwhile return s neighbor; end;

Fig. 6. Local Short Hop Procedure

resource, we search for an exchange between one of the tasks of this resource and a task of another resource r2 that decreases the value of added flowtime for r2 . Again, with a certain probability it is required that the new value of completion time of the resource r1 is not worsen. In case r1 is empty, then it is assigned any task. Variable Neighborhood Search (VNS) The neighborhood relationship is defined in a variable way, that is, two solutions are considered neighbors if they differ in n positions of their vectors of assignments task-resource, where n is a parameter. Therefore, to generate a neighboring solution it is only necessary to modify n different positions randomly chosen of the vector of assignments of the starting solution. Clearly, n must be smaller than N (number of tasks). Note that for n = 1 this mechanism is just the Local Move procedure. 6.2 Tabu search: Local Tabu Hop We give a special attention to movement based on Tabu Search (TS) algorithm, that is, the use of TS for improving the solution. TS [16] is a metaheuristic that tries to avoid falling into local optima by means of an intelligent

24

Fatos Xhafa

mechanism based on an adaptive memory and a responsive exploration of the search space. The adaptive memory is the capacity of the algorithm to remember certain aspects of the search such as promising regions of the search space, the frequency of certain solution characteristics, or the recently realized movements, in such a way that the search process is much more effective. On the other hand, the responsive exploration consists in an exploration based on more intelligent decisions that the simple blind search, for instance the exploitation of the characteristics of the best solutions or the temporary acceptance of worst solutions in order to escape from local optima and to aspire to better solutions. It has amply been proven that TS is one of the most effective search procedures for combinatorial optimization problems. TS, in general terms, is based on different phases that can be classified in exploration phases and diversification phases. The exploration phases consist in the improvement of the current solution through movements to neighbor solutions. On the other hand the phases of diversification serve to move away from the region that is being explored toward a new region, and in this way to avoid falling in local optima. In order to avoid cycling among already visited solutions, which in spite of moving away from the current region becomes inevitably, inverse movements of already applied movement are prohibited by giving them the tabu status via a short term memory (also called tabu list). It is known that some MAs have a good synergy with TS when they use it as individual steps for diversification and local optimization by the agents [5]. However, any hybridization with TS requires a careful design so that TS be advantageous for the evolutionary search. Therefore, it is necessary to carefully adjust the balance between the global search and the search carried out by TS. In fact, TS in spite of being a local search algorithm is far superior than all previous local search procedures therefore we are dealing here with a proper hybridization between the MA and TS algorithms (MA+TS). begin Compute an initial solution s; sˆ ← s; Reset the tabu and aspiration conditions. while not termination-condition do Generate a subset N ∗ (s) ⊆ N (s) of solutions such that, either none of the tabu conditions is violated or the aspiration criteria holds. Choose the best s0 ∈ N ∗ (s) with respect to the objective function. s ← s0 ; if improvement(s0 , sˆ)) then sˆ ← s0 ; Update the recency and frequency. if (intensification condition) then Perform intensification procedure. if (diversification condition) then Perform diversification procedures. endwhile return sˆ; end;

Fig. 7. Basic Tabu Search algorithm

An Evolutionary Heuristic for Job Scheduling on Computational Grids

25

We give in Fig. 7 a basic TS algorithm. The initial solution is basically the starting solution of the Hill Climbing. The exploration process is based on a neighborhood exploration of the type steepest descent-mildest ascent where the neighbor solution is chosen to be the one that most improves the objective function. In case that no better solution is found, the solution that less worsen the objective function is accepted. If the best resulting solution is repeatedly encountered along a certain number of iterations, the search is intensified in the region of the current solution (intensification). The intensification process implies a more exhaustive exploration of the region of the search space by means of different techniques, for instance, by enlarging the neighborhood, by modifying the structure of the neighborhood or via the exploitation of most frequent characteristics of the solution observed during the exploration. In case it has not been possible to improve the solution during a certain number of iterations it means that the exploration process has been stagnated in a local optimum and therefore the mechanism of soft diversification is activated and thus moving the search to a new region “close” to the current region. The soft diversification we have implemented is based on the penalization of the most frequent solution characteristics observed during the search and the promotion of the less frequent ones. The strong diversification (also called escape) is applied when neither the intensification nor the soft diversification have been able to improve the solution in course and, therefore, the search is very possibly definitely stagnated. The strong diversification consists thus in a greater perturbation of the solution in order to launch the search from a new region far away from the current one. The neighborhood relationship implemented in the TS for job scheduling is based on the idea of the load balancing. The neighborhood of solutions consists of all those solutions to which we can reach via swap of the tasks of an overloaded resource with those of the less overloaded ones, or via move of tasks of an overloaded resource to the less overloaded resources. Note that TS will be executed a certain number of iterations, called phase, in such a way that for each iteration of the Hill Climbing the specified number of TS will be carried out. Additionally, the TS stops if no improvement is obtained during a specified number of iterations. Another important element of the hybridization is the dynamic setting of the TS parameters described next: (a) tabu size: The parameter of short-term memory (maximum number of already visited solutions that the algorithm is able to identify.); (b) max tabu status: The parameter that indicates the maximum number of iterations for which a movement is held tabu; (c) max repetitions: Number of iterations the algorithm could be executed without any improvements; after that the intensification is activated; (d) nb diversifications: Number of iterations of soft diversification process. An excessive diversification is not of interest since the mechanism of TS won’t be able to execute enough iterations for each solution and therefore it is not so high-priority to avoid falling into local optima; (e) nb intensifications: Number of iterations of intensification process. Its behavior is similar to the previous parameter;

26

Fatos Xhafa

(f) elite size: Number of best encountered solutions that will conform the long-term memory; (g) aspiration value: The parameter that allows to establish the level of aspiration of a solution; (h) max load interval: Maximum deviation with respect to the makespan in order for a resource to be considered overloaded. A value of one would mean that overloaded resources are only those with completion time = makespan, and (i) percent min load: Percentage of resources that will be considered less overloaded. 6.3 Movement evaluation In order to achieve a good performance of the MA + Local search algorithm, an efficient generation of neighboring solutions is necessary. This would make possible a good trade-off between the intensification of the local search and the exploration of the global search. One of the most expensive operations along the algorithm is, for any neighboring solution, the evaluation of its objective function value (which implies the calculation of makespan and flowtime). It’s worth emphasizing that the evaluation to which we refer to is not the one made by the Hill Climbing to accept or not a neighboring solution, but the one that is carried out in the method generateNeighbor() to evaluate and, to choose or to reject the solutions that conform the neighborhood of the solution at hand. In the case of the steepest ascent methods, for instance, this evaluation has to be carried out for all neighbor solutions in order to identify the best of all them. However, as observed in [18] (and in many other examples in literature), we just need to compute the gain of the new solution with respect to the solution whose neighborhood is being generated; clearly, this is computationally less expensive. The sixteen local search mechanisms presented here use different approaches when evaluating the gain of a movement, in accordance with the heuristic they follow. The evaluation approaches when evaluating the gain of a movement, except for LM, LS and VNS, which are based on a purely random exploration of the neighborhood, can be classified as follows: • Use of the same optimization criteria as Hill Climbing: this is the case of SLM and SLS, which carry out the movement (move or swap, resp.) that minimizes the approach followed by the local search. The evaluation criterion is the same as the one of Hill Climbing for which different alternatives have been proven (see later). However, in both hierarchic and simultaneous optimization of makespan and flowtime, the evaluation is quite expensive since it implies the calculation of these values. • Evaluation of the completion time of a resource: the gain of a movement is evaluated from the reduction of the completion time of a concrete resource. In the case of the LR and DLR, the reduction of the completion time is evaluated for one of the most overloaded resources (with completion time = makespan). For LMCTM the computation is done differently, it computes the reduction yielded by the movement of a task to the resource for which the least completion time is obtained. Similarly, the mechanism of ERR

An Evolutionary Heuristic for Job Scheduling on Computational Grids

27

carries out the same balancing as LR but reducing the completion time of any randomly chosen resource. • Evaluation of the maximum completion time obtained by the resources implied in a movement: the gain of a movement is evaluated according to the maximum reduction of the completion time of the resources implied in that movement. As a matter of fact, this approach has given best results in the TS implementation. The heuristics based on the completion time of the resources show to obtain the best results since they have a direct implication in the reduction of the makespan value, the most important optimization parameter of the grid system. However, we have also designed different heuristics based on the flowtime, which in spite of obtaining reductions not as good as those of makespan, are very interesting in view of a generic memetic algorithm, which would allow the maximization not only of makespan but also of flowtime in the planning of the grid by adaptively changing the intensification in the reduction of the makespan as well as of the flowtime. In this sense the approaches we have used are the following: • Evaluation of flowtime of a resource: this is the case of LMFTM where a movement from a task to the resource yielding the smallest value of (local) flowtime. In the LFR mechanism, the reduction of the flowtime of the resource of greatest flowtime value, is evaluated; in the ERR mechanism, the reduction of the flowtime of any randomly chosen resource is evaluated. • Evaluation of the sum of flowtimes of the resources implied in a movement: this is the case of LMFTS where a movement swap that minimizes the sum of flowtimes. Notice that the use of this approach impacts directly in the reduction of the value of total flowtime of the solution. The evaluation for LTH mechanism, which is in fact a hybridization of MA and TS, is considered by separate since it follows its own mechanism of exploration of the neighborhood. More precisely, the TS has been configured so that it follows a hierarchic optimization of the makespan and flowtime values for which the TS has shown a better behavior in the reduction of the makespan. 6.4 Optimization criterion of local search A very important aspect when designing an MA is the optimization approach that should follow the local search. Note that the optimization criterion of local search not necessarily has to be the same as the optimization criterion of the global search; in fact, in case of being the same it could damage the overall search. In our case, we have experimented with three alternatives as regards the optimization criterion followed by the Hill Climbing: a) Same approach as the one of the global search: In this case, a neighboring solution is accepted only if it improves the value of the fitness function.

28

Fatos Xhafa

b) Hierarchical approach: priority is given to the improvement of makespan versus flowtime. In case that the neighboring solution has the same makespan value then this solution is accepted if it improves the flowtime value. c) Relaxed Hill Climbing: In this case, it is the heuristic used to generate the neighboring solution who evaluates the improvement obtained by the new solution. To this aim, the function generateNeighbor() has been modified to return a value q ∈ [−1, 1] quantifying the improvement of the neighboring solution. A positive value indicates an improvement, and therefore the Hill Climbing will accept the solution, and a negative value indicates a worsening and in this case it will be rejected. It should be noticed that q indicates a quantification of the improvement, and therefore, we could adjust the Hill Climbing to be more or less tolerant to only accept solutions starting from a certain value (in our case the threshold value of acceptance has been set to q = 0). Each mechanism provides this value according to its own approach to evaluate the movements (as explained in Subsect. 6.3): • For the mechanisms based on the completion time (LM, LS, LR, DLR, LMCTM, LMCTS, ERR, LSH and VNS): Considering cmax the maximum completion time of the resources that participate in the movement in the solution, and c0max the maximum completion time of the same resources for the neighboring solution, then q is computed as: q = (cmax − c0max )/ max{cmax , c0max }. • For the mechanisms SLM, SLS and LTH the improvement is computed according to a hierarchic approach (first makespan, next flowtime). Letting m and m0 be the makespan of the source solution and of the neighboring solution, resp., and f and f 0 for the flowtime, then: if m 6= m0 , q = (m − m0 )/ max{m, m0 }, otherwise q = (f − f 0 )/ max{f, f 0 }. • For the mechanisms based on the flowtime value (LMFTM, LMFTS, LFR, ERR): Letting sf and s0f the sums of the flowtime of the resources that participate in the movement for the source and neighboring solution, resp., then q = (sf − s0f )/ max{sf , s0f }. After a preliminary experimenting, the alternative b) has been selected as the evaluation approach, which has shown a coherent and effective behavior.

7 Implementation issues In order to implement the MA algorithm and its hybridization with the local search procedures and the Tabu Search, our starting point was an algorithmic skeleton for MA, a first version of which appeared in [5] and have been adapted in order to easily manage more than one local search engine. As for the presented local search mechanisms presented, except TS procedure, they have been implemented as methods; the TS procedure is taken from a generic

An Evolutionary Heuristic for Job Scheduling on Computational Grids

29

skeleton for TS instantiated for job scheduling on computational grids, as we show next. Implementation of the MA template We can easily observe from Fig. 1 that the MA template defines the main method of the memetic algorithm and uses other methods and entities that are either problem-dependent or problem-independent. In this sense, we can see the MA template as an algorithmic skeleton some parts of which need to be “filled in” for any concrete problem to be solved through MA. Clearly, the skeleton implementation offers a separation of concerns: the problem-independent part is provided by the skeleton while the problem dependent-part is required, i.e. the user has to provide it in instantiating the skeleton for a concrete problem. In order for the two parts to communicate, the skeleton fixes the interface of both problem-independent and problem-dependent methods/entities in such a way that the problemindependent part uses the problem-dependent methods/entities through their signature/interface and, vice-versa, the problem-dependent methods can be implemented without knowing the implementation of the MA template itself (see Fig. 8).

1 Problem 

P i 1 PPP 1 6 Local Search Engine

* Solution  1 6

PP PP k

P

1

Solver

Population

  1   1

1

  1 Setup

Classes Required Classes Provided

Fig. 8. Class diagram of MA

In the diagram, the class Solver is in charge of the main method of MA and uses other methods and entities. Thus the implementation of the MA template consists in fully implemented classes Solver and Setup and fixed interfaces for the rest of the classes whose implementation will be latter completed by the user for a concrete problem, in our case, job scheduling on grid environments. The implementation is done in C++ by translating the methods and entities mentioned above into methods and classes. Classes labelled as Provided are completely implemented and thus provided by the skeleton implementation whereas classes labelled as Required contain just their interface and are later completed by the programmer according to the concrete problem through the instantiation process. Thus, the implementation of MA for job scheduling was completed once the required classes and methods were implemented.

30

Fatos Xhafa

Implementation of the Tabu Search Again, for the implementation of TS we have used the idea of the algorithmic skeleton defined in [3]. The main observation to bring here is that the Solver class for TS has been designed in a way that it offers the possibility to execute not only the whole TS method but also just a phase of TS made up of several iterations. This functionality of TS Solver is precisely what we needed to implement the hybridization MA+TS.

8 Experimental study In this section, the objective is to experimentally set up the MA implementations for Job scheduling and trying to improve even more the results obtained with existing GA and other meta-heuristic implementations in the literature for the problem. To this end, an extensive experimental study was performed to adjust the large group of parameters of the MA, by putting special attention to tuning of the local search procedures and TS methods to identify the best hybridization. 8.1 Fine tuning MAs are characterized by a large number of parameters that determine its behavior. It is already known that setting the parameters to appropriate values is a tedious and complex task. Most of the parameters are mutually related, therefore, in order to obtain a good performance of the algorithm the setting of parameters have to be established in a certain order, decided by the importance of the parameter in the MA. In other words, the approach we have adopted for setting of parameters is to successively fix their values, starting from the most important parameter to the least important one. One the other hand, it is important to use generic instances of the problem so that the values of the parameters will not be highly dependent on the type of the input. These instances has been generated using the ETC matrix model (see Sect. 3). Hardware/software characteristics and general setting The experimenting is done in the machines of the MALLBA cluster3 (AMD K6(tm) 3D 450 MHz and 256 Mb of RAM under Linux/Debian OS). In order to avoid casual values for the parameters, 20 executions are performed. The execution time limit has been set to 90 secs, which is a very limited time; this time limit is taken as a reference from Braun et al. [6]. Note that in a grid system the response time of the scheduler should be as small as possible in order for the system to adjust itself to the possible dynamic changes. 3

http://www.lsi.upc.edu/∼mallba/public/clusters/BA-Cluster/

An Evolutionary Heuristic for Job Scheduling on Computational Grids

31

Tuning of local search procedures The tuning of local search procedures is critical to the MA behavior. Indeed, the contribution of the local search will depend on how much time (how many iterations) will be devoted to it; note that the local search procedures presented here have different time complexity therefore depending on the time for the local search, some local search would be more preferable than others. Most importantly, the contribution of the local search will depend on how fast/slow converges the population. For instance, if the selective pressure of the MA were high, the population would rapidly converge to good solutions, and thus local search can hardly yield any improvements, or, we should seek for the appropriate local search that is able to improve even high quality solutions. On the other hand, in case of less selective pressure, it will be preferable a local search able to improve not so good solutions, very quickly. The executions reported in this section have been divided into two groups: in the first, the number of iterations for the local search is set to 70 iterations and in the second the local search is limited to 5 iterations. For LSH and LTH procedures, just one iteration of 70 hops is done, in the first case, and with 5 hops in the second. The rest of values of the parameters4 is shown in Table 2. It’s worth observing the large number of recombination in spite of a rather small population size (20 recombinations for a population of 30 individuals) as well as the use of a single mutation in each iteration. On the other hand, observe that the local search is not limited to a maximum number of iterations without improvement. Table 2. Values for the parameters for tuning the local search procedures nb generations (max 90s) population size 30 nb solutions to recombine 2 nb recombinations 20 nb mutations 1 start choice StartLJFRSJFR select choice Best Selection recombine choice One point recombination recombine selection Random Selection mutate choice Mutate Move mutate selection Random Selection nb ls not improving iterations +∞ add only if better false

4

The values of main TS parameters has been set as follows: max tabu status=1.5·M (M-number of machines), nb diversifications=log2 (N) (N-number of tasks), nb intensifications=log2 (N), elite size=10; tabu list is implemented as a hash table.

32

Fatos Xhafa LM SLM LS SLS LR DLR LMCTM LMCTS LMFTM LMFTS LFR LSH ERR ERFR VNS (2 steps) LTH

10 LMFTS 14000000 9 LMFTM

4 SLS

13000000 14 ERFR

11 LFR

2 SLM

12000000 6 DLR 16 LTH

Makespan

11000000 12 LSH 13 ERR

10000000

15 VNS(2) 1 LM 3 LS

9000000 5 LR

8000000 7 LMCTM

8 LMCTS

7000000 0

10

20

30

40

sec

50

60

70

80

90

Figura 4.8 Reducció del makespan del MA amb els diferents mecanismes de cerca local executant 5 iteracions del Hill Climbing Fig. 9. Reduction of makespan obtained from MA+local search procedures; each

local search is run for 5 Hill Climbing iterations

Els següents resultats, mostrats en la figura 4.9, han estat obtinguts augmentant el pes de la cerca local respecte la cerca local. El nombre de generacions que s’han pogut executar per cada configuració en 90 segons ha disminuït dràsticament respecte les execucions anteriors al haver moltes mésMA iteracions de cerca localdifferent per cada nou individu. Thed’executar behavior of the combined with search mechanisms, which 5 beneficiat per una Destaca obtinguda pel LTHto el the qualglobal es veu search clarament are givenlaa gran low millora priority with respect , is given in Fig. 9. intensificació de la cerca local per sobre la cerca global. L’algorisme de TS ha mostrat ser Inmolt thispotent case,pereach newlydegenerated individual is islightly byd’arribar the local problemes planificació de recursos ([9] [33]) i perimproved sí sol és capaç a millors valors de makespan que els propischanges. algoritmes The evolutius. raófollowed convé search without producing significant MA Per + aquesta LMCTS molt bé el balanç entre the la feina MA i laasdel TS as per the tal d’obtenir millor byestablir MA + LMCTM obtain bestdelresults well fastest el reduction rendiment. Els diversos experiments mostren que efectivament, els resultats obtinguts inmilloren the makespan. One alexplanation of del this beelthat bothaixò LMCTS and considerablement potenciar el pes TS could per sobre MA, però no obstant s’arriba a don’t un punt try en què necessari procés de pressió selectiva i de diversificació LMCTM to és reduce theel makespan of the solution in a directque way, proporciona cerca poblacional. but rather, una they simply distribute the tasks making the best decision in a

local way,fa without manyelstimes makespan value.per la majoria de Pel que a la resta reducing de mecanismes, valors the de makespan obtinguts configuracions no ha variat i per tant es mostra una certa constància Next, we increased the massa, protagonism of the local search with respectentoelthe comportament de l’algorisme pesos queiterations. juguin la cerca i la resta global search by executing malgrat 70 HillelsClimbing We local observed thatdethe l’algorisme evolutiu. Hi ha una lleugera tendència a empitjorar els resultats a mesura que la number of juga generations that the algorithm toveu process for each cerca local un major pes (exceptuant el LTH) jahave que debeen fet, elable MA es més beneficiat per petites millores realitzades a cada individu que potenciïn la cerca as evolutiva que noto pasthe configuration within 90 seconds has drastically dropped compared grans intensificacions a la vegada impliquen un altthat risc d’estancament entorn local d’un òptim previous case; this que is explained by the fact executing more search local. iterations for each new individual takes more time. We give the comparison Els mecanismes de LMCTS i LMCTM, malgrat tot, segueixen sent una bona opció. Cal results in Fig. 10. destacar la millora del LR el qual també es veu més beneficiat per un ús més intensiu de la It is straightforward to see from Fig. 10 the great improvement obtained cerca local. Pel que fa a la resta, exceptuant el SLM, es verifica la tendència a empitjorar el bycomportament. the LTH, which is clearly benefited by an intensification of the local search to the detriment of the global search. As a matter of fact, TS has shown to be very powerful for solving planning of resources [31] and by itself it is be able 96 to achieve good makespan values. For this reason it is important to carefully establish the trade-off between the work of the MA and that of the TS in order to obtain the best reduction of makespan. The diverse experiments we have 5

Only makespan values are reported though flowtime has been computed as well as part of objective function.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

33

LM 10 LMFTS

SLM

9 LMFTM

4 SLS

LS

14000000

SLS

11 LFR

LR

13000000

DLR 3 LS

14 ERFR

LMCTM LMCTS

12000000

LMFTM 6 DLR

LMFTS LFR

Makespan

11000000

1 LM

LSH

2 SLM

12 LSH

ERR

15 VNS(2)

ERFR

10000000

13 ERR

5 LR

VNS(2 steps)

7 LMCTM

LTH

9000000 8 LMCTS 8000000 16 LTH

7000000 0

10

20

30

40

sec

50

60

70

80

90

Fig. 10. Reduction of makespan obtained from MA+local search procedures;Fieach 4.9 Reducció del 70 makespan del MA amb iterations els diferents mecanismes de cerca local executant 70 local gura search is run for Hill Climbing iteracions del Hill Climbing

Fruit d’aquests resultats, s’emprarà un mecanisme com ara el LMCTS per ajustar i comparar

performed showed that indeed, jathe results molt improved considerably els diferents operadors i paràmetres, que obtained el LMCTS representa bé un mecanisme de whencerca giving more priority, i.e. running time, to the TS; however a point local d’iteració poc costosa i eficaç. Pel que fa al mecanisme de LTH convé ajustar el in time MA is reached where the selective pressure and the diversification provided de forma individual per tal de potenciar al màxim les capacitats conjuntes del TS i del by MA MA.are necessary. Concerning the rest of mechanisms, the makespan values for most of the Operador de mutació: configurations do not vary too much, and therefore a certain constancy is shown in the behavior of the algorithm in spite of changing the priority given L’operador de mutació segueix tenint un paper clau en l’evolució de la població ja que, tot i to the local search within MA. There is a slight tendency in worsening the l’ús de la cerca local per millorar les solucions de forma individual, la mutació serveix per results as the search is given more priority (except the LTH) since introduir certlocal component de diversificació per intentar estendre la cerca for per altres regions de to the MA, there are, in fact, more beneficial small improvements of l’espai de cerca. Això no obstant, el mecanisme de cerca local i el de mutació tenen moltseach individual intensifications, whichlaatmutació the same imply a high punts enthan comú heavy tal que fins i tot es podria considerar com unatime iteració més d’un procés de stagnation. cerca local. La diferencia en laLMCTS funcionalitat delsLMCTM operadors: en el procés risk of local Finally,doncs, noterau that and mechanisms cerca local, per good molt aleatòries que puguin ser les modificacions, les solucions acceptades showdeagain a very performance. sónconsider aquelles quenow aporten milloraofrespecte l’anterior i per tant les que s’aproximen a un We theuna tuning the MA operators. òptim local, mentre que en una mutació, sigui quin sigui l’operador utilitzat, qualsevol modificació és vàlida i sempre serà acceptada la nova solució en cas que no es segueixi una Mutation operator política de add_only_if_better = cert, diversificant la cerca cap a altres punts.

The Mutation operator plays a key role in the population’s evolution, in spite L’operador de mutació s’especifica a través del paràmetre mutate_choice de la classe Setup. of using thede local search to per individually solutions; mutation La resta paràmetres utilitzats l’ajustament esimprove mostren enthe la taula 4.2. El percentatge de is good mutacions to introduce certain diversification components and to try to extend en cada generació s’ha augmentat considerablement per sobre el de the search to other of theelssearch space. However, recombinacions per regions tal de potenciar efectes de l’operador a ajustar.the S’hamechanism establert el of cerca localhave en unmany valor aspects mitjà de 30 a 10 el local nombre searchd’iteracions and that deof lamutation in fixant common; in nombre fact, one màxim sense millora en laas cerca d’aquesta s’obté unaThe configuració couldd’iteracions even consider the mutation anlocal, iteration of manera local search. difference moltin més eficient evitantiniteracions inútils of en local un òptim local.the S’haaccepted establert lasolutions política de are consists the fact that the process search those that contribute an improvement with respect to the previous one, while 97

34

Fatos Xhafa

in a mutation, any modification is valid, diversifying the search toward other regions. The values of parameters used for the adjustment of this operator are shown in Table 3. Table 3. Values of the parameters used in tuning the mutation operator nb generations (max 90s) population size 80 nb solutions to recombine 2 nb recombinations 30 nb mutations 30 start choice StartLJFRSJFR select choice Best Selection recombine choice One point recombination recombine selection Random Selection mutate selection Random Selection local search choice LMCTS nb local search iterations 30 nb ls not improving iterations 10 add only if better false

We give in Fig. 11 the behavior of different mutation operators we have implemented. As we can observe, the behavior of mutation operator is pretty much the same (for the considered configuration). Nevertheless, the best reduction (a clear superiority in the behavior) is obtained by the mutation rebalance operator. Move

12000000

Swap Both (25% moves) Both (50% moves)

11500000

Both (75% moves) Rebalance (25%) Rebalance (50%) Rebalance (75%)

11000000 2 Swap

Makespan

10500000 1 Move

10000000 3 Both(25%) 5 Both(75%) 9500000 6 Rebalance(25%)

9000000

4 Both(50%)

7 Rebalance(50%)

8 Rebalance(75%) 8500000

8000000 10

20

30

40

sec

50

60

70

80

90

Figura 4.10 Reducció del makespan del MA amb els diferents operadors de mutació

Fig. 11. Reduction of makespan for different mutation operators Operador de recombinació: L’operador de recombinació serveix bàsicament per crear noves solucions a partir de la informació genètica de solucions ja existents en la població. Un dels aspectes més importants d’aquest operador és la capacitat de potenciar informació genètica que continguda en una solució podia no tenir cap pes i que, combinada amb la d’altres solucions pot aportar millores molt destacades sense afegir cap informació genètica nova. Per tant, la simbiosis entre l’operador de recombinació i el mecanisme de cerca local és d’elevada importància, ja que la recombinació és la que servirà per estendre i potenciar en altres individus els canvis de la cerca local, de la mateixa manera que la cerca local millora els individus produïts per la recombinació. En la taula 4.3 es pot observar la configuració utilitzada per ajustar l’operador de recombinació. S’eleva considerablement el percentatge de recombinacions en cada generació per tal de potenciar el seu comportament. De la mateixa manera, la política d’addicció segueix sent tolerant (add_only_if_better = fals) per també observar millor els efectes de la recombinació. L’operador de recombinació s’estableix a través del paràmetre recombine_choice de la classe Setup. El nombre de solucions a recombinar (nb_solutions_to_recombine) també és determinant pel que fa al comportament de l’operador de recombinació, per tant, s’ha experimentat amb els valors de 2, 4 i 6 solucions a combinar per cada operador de

An Evolutionary Heuristic for Job Scheduling on Computational Grids

35

Recombination operator The mission of Recombination operator is to create new solutions from the genetic information of solutions in the population. One of the most important aspects of this operator is its capacity to promote genetic information contained in a solution that could possibly not have any relevance but when combined with that of other solutions could yield improvements without adding any new genetic information. In this sense, the symbiosis between the recombination operator and the local search mechanism is very important, since the recombination operator will extend to other individuals the changes of the local search; on the other hand, the local search will try to improve the individuals produced by the recombination. We give in Table 4 the values of parameters we have used to study the behavior of the different recombination operators. Table 4. Values of the parameters used in tuning the recombination operator nb generations (max 90s) population size 70 nb recombinations 60 nb mutations 10 start choice StartLJFRSJFR select choice Best Selection recombine selection Random Selection mutate choice Rebalance mutate extra parameter 0,75 mutate selection Random Selection local search choice LMCTS nb local search iterations 15 nb ls not improving iterations 10 add only if better false

Observe that we have considerably increased the percentage of recombinations in order to see the effect in its behavior. The number of solutions to recombine is also decisive for the behavior of this operator; we have considered 2, 4 and 6 solutions to be recombined for each implemented recombination operator. The behavior of the One-Point recombination operator for different number of solutions to recombine is shown in Fig. 12. The best reduction in the makespan among the implemented recombination operators is obtained when combining two solutions, the results is worsened when the number of solutions to recombine is increased. This behavior is observed for the three implemented recombination operators (One-Point, Uniform and Fitness Based). We give in Fig. 13 the comparison of recombination operators using 2 solutions to recombine. As can be observed, One-Point operator obtains the best results; it’s interesting to observe that the behavior of Uniform and Fitness Based operators is almost identical, evidencing the fact that distinguishing the solutions from their fitness when combining them doesn’t help.

Taula 4.3 Valors dels paràmetres establerts per l’ajustament de l’operador de recombinació

Tal i com es mostra en la figura 4.11, la millor reducció del makespan per tots els operadors s’obté combinant només dues solucions, empitjorant els resultats a mesura que s’augmenta el nombre de solucions. Si el nombre de solucions és excessivament elevat la mescla tendeix a destruir els diferents esquemes continguts en les solucions i es redueixen les possibilitats d’obtenir una solució millor. 36

Fatos Xhafa 15000000 OP (2 solutions) OP (4 solutions) OP (6 solutions)

14000000 2 OP(4) 13000000

Makespan

12000000

11000000

3 OP(6)

1 OP(2)

10000000

9000000

8000000 0 30 40 50 60 70 80 90 idèntics, evidenciant el10fet que20de poc serveix distingir sec les solucions pel seu fitness a l’hora de combinar-les, ja que els valors de fitness no estan prou distanciats com per distingir la política Figura Comparació delthe comportament obtingut segons elindividu nombre de solucions a according recombinar per Fig. 12.4.11 Comparison of behaviorels One-Point recombination operator Fitness Based amb la Uniform, on s’agafen gens de cada amb igual probabilitat.

de recombinació Oneto Point tol’operador the number of solutions recombine

Pel que 15000000 fa a l’operador de recombinació, en la figura 4.12 es mostra el comportament obtingut OP (2 solutions) Uni (2 solutions) pels diferents operadors (One-Point, Uniform i Fitness Based) combinant dues solucions en FB (2 solutions) 14000000 cada recombinació. L’operador de One-Point és el que obté els millors resultats, tot i no estar massa distanciat de la resta. Els operadors de recombinació Uniform i Fitness Based són quasi

Makespan

13000000

100

12000000 2 Uni(2) 11000000 1 OP(2)

3 FB(2)

10000000

9000000

8000000 0

10

20

30

40

sec

50

60

70

80

90

Fig.Figura 13. Reduction ofdel the makespan obtained by different recombination operators 4.12 Reducció makespan obtinguda pels diferents operadors de recombinació (two solutions are recombined)

Selecció per recombinar: Aquest operador és un dels que participa en el procés de selecció de l’algorisme memètic. Consisteix en la selecció de les solucions que es combinaran en cada recombinació. De forma indirecta, en aquest procés selectiu també es selecciona en gran part la informació genètica que es reprodueix i per tant que tindrà més probabilitats de perdurar. L’operador es determina a través del paràmetre recombine_selection de la classe Setup. A l’hora d’ajustar aquest operador cal tenir en compte que no és l’únic selector de solucions sinó que ha de actuar en sincronia amb l’operador de selecció per mutar i el selector de la generació següent. El conjunt de paràmetres utilitzat per l’ajustament d’aquest operador es mostra en la taula 4.4. Com s’està tractant el procés de recombinació, es manté un percentatge de recombinacions elevat. De la mateixa manera, s’ha establert a 3 el nombre de solucions que es combinen a cada recombinació per remarcar els efectes de la selecció. Tal i com es pot observar en la figura 4.13, hi ha una clara distinció en el comportament del memètic segons el mecanisme de selecció de les solucions a combinar. La selecció exponencial queda totalment descartada al donar els pitjors resultats. Seguidament, la selecció aleatòria random que s’havia utilitzat fins ara és la que segona pitjor. A partir d’aquí, les

An Evolutionary Heuristic for Job Scheduling on Computational Grids

37

Selection operator for recombination This operator is in charge of selecting the solutions to be recombined. When adjusting this operator it is necessary to keep in mind that it is not the unique selection mechanism; it has to act in synchrony with the selection of solution to mutate and the selection of the next generation. The values of the parameters used for tuning this operator are given in Table 5. Table 5. Values of the parameters used in tuning of selection operator for recombination nb generations (max 90s) population size 50 nb solutions to recombine 3 nb recombinations 40 nb mutations 10 start choice StartLJFRSJFR select choice Best Selection recombine choice One point recombination mutate choice Rebalance mutate selection Random Selection local search choice LMCTS nb local search iterations 15 nb ls not improving iterations 5 add only if better false

We give in Fig. 14 the behavior of the different selection operator for recombination. It is clearly observed from the figure the distinction in the behavior of the MA according to the selection mechanism for recombination. The exponential selection obtained the worst results. Similarly, the random selection performed quite poorly. The rest of selection operators, from the Binary Tournament selection to the Linear Ranking, all them show a very similar behavior, reducing the makespan slowly but reaching to considerable reductions. The best makespan reductions, however, are obtained by different configurations of the N-Tournament operator, for which as evidenced in Fig. 14, improves its behavior as more individuals compete in the selection. Note however that it is Best Selection operator that obtains the most accentuated reduction of the makespan and therefore it would be the best option in case of having a very reduced computation time. Best Selection operator stagnates quickly being outperformed by the N-Tournament with N=7, which definitely obtains the best results. Selection operator for mutation The selection operator for mutation is in charge of selecting an individual to which the mutation operator will be applied. The configuration of parameters used for tuning this operator is given in Table 6 and the resulting behavior of the different operators is given in Fig. 15. The resulting behavior of the different operators is almost the same as for the selection operator for recombination except for the Best Selection which

Cal afegir que no convé establir una política massa selectiva a priori, encara que són els operadors més exigents els que obtenen els millors resultats, ja que encara s’han d’ajustar la resta d’operadors de selecció i per tant la política de selecció no ha de quedar massa determinada per un sol operador sinó per tot el conjunt. Per aquesta raó s’ha escollit l’operador N Tournament amb N = 3 com a selector per recombinar en l’ajustament de la resta d’operadors. 38

Fatos Xhafa 12000000 Random 11500000

Best LinearRanking Exponential

11000000

Binary Tournament N Tournament (3) N Tournament (5)

10500000

Makespan

N Tournament (7) 10000000

9500000

9000000

8500000

8000000

7500000 0

10

20

30

40

sec

50

60

70

80

90

Fig.Figura 14. Reduction makespan obtained by different selection for 4.13 Reduccióofdelthe makespan obtinguda pels diferents operadors de seleccióoperators per recombinar recombination 12000000

102

11500000

Random Best LinearRanking

11000000

Exponential Binary Tournament 10500000

N Tournament (3)

Makespan

N Tournament (5) N Tournament (7)

10000000

9500000

9000000

8500000

8000000

7500000 0

10

20

30

40

sec

50

60

70

80

90

Fig. 15. Reduction of del the makespan obtained by different selection operators for Figura 4.14 Reducció makespan obtinguda pels diferents operadors de selecció per mutar mutation

Selecció de la població: El darrer operador de selecció és el que redueix la mida de la població a la mida establerta inicialment escollint els individus que formaran part de la següent generació a partir dels individus existents a la població actual. L’ajustament d’aquest operador depèn de com s’hagi ajustat els operadors de selecció per mutar i per recombinar, ja que entre tots conformen el grau de pressió selectiva de l’algorisme memètic i per tant han de funcionar en sintonia. El conjunt de paràmetres utilitzats per l’ajustament es mostra en la taula 4.6. nb_generations population_size nb_solutions_to_recombine nb_recombinations nb_mutations start_choice recombine_choice recombine_selection rec_selection_extra

(max 90s) 70 3 56 15 StartLJFRSJFR One point recombination N Tournament 3

An Evolutionary Heuristic for Job Scheduling on Computational Grids

39

Table 6. Values of the parameters used in tuning of selection operator for mutation nb generations (max 90s) population size 50 nb solutions to recombine 3 nb recombinations 20 nb mutations 40 start choice StartLJFRSJFR select choice Best Selection recombine choice One point recombination recombine selection N Tournament mutate choice Rebalance local search choice LMCTS nb local search iterations 15 nb ls not improving iterations 5 add only if better false

ends up obtaining the best makespan reduction in spite of a very accentuated reduction and an accused stagnation. The population’s selection This operator reduces the size of the current population to the initial established size by selecting the individuals that will form the next generation. The configuration of parameters used for tuning of this operator is given in Table 7 and the resulting behavior of different operators (except the exponential selection, which performs very bad) is given in Fig. 16. Table 7. Values of the parameters used in tuning of population’s selection operator nb generations (max 90s) population size 70 nb solutions to recombine 3 nb recombinations 56 nb mutations 15 start choice StartLJFRSJFR recombine choice One point recombination recombine selection N-Tournament mutate choice Rebalance mutate selection Best Selection local search choice LMCTS nb local search iterations 15 nb ls not improving iterations 5 add only if better false

Addition policy Another aspect of the MA algorithm is the addition policy, that is, the criterion for adding a new individual to the population and to establish the maximum size up to which could grow the population size (i.e. the proportion of new added individuals in each evolutionary step with respect to the initial population size).

de l’espai de cerca i per tant obté valors de makespan més reduïts que els operadors de selecció anteriors. Tanmateix, són els operadors de Linear Ranking i de Binary Tournament els que obtenen els millors resultats així com una ràpida reducció del valor de makespan en el temps.

40

En certa manera, el fet que siguin els operadors menys selectius els que obtinguin millors resultats es pot explicar pel fet que la política selectiva fins ara establerta, tan per la mutació (Best Selection) com per la recombinació (N Tournament), ja és prou elitista i per tant s’aconsegueix un millor comportament si es compensa amb una selecció més tolerant a l’hora de seleccionar la població de la següent generació, com ara la simple selecció aleatòria. En tot Fatos Xhafa cas, s’adopta l’operador de Binary Tournament pels ajustaments posteriors. 3500000

Random Best LinearRanking Binary Tournament

3450000

N Tournament (3) N Tournament (5) 3 Linear Ranking

N Tournament (7)

3400000 1 Random

Makespan

3350000 2 Best

3300000

3250000 6 Ntour(5)

5 Ntour(3)

7 Ntour(7)

3200000

4 Binary Tournament 3150000 0

10

20

30

40

sec

50

60

70

80

90

4.15 Reducció del makespan makespan obtinguda pels diferents operadors de selecció de laselection població Fig. 16. Figura Reduction of the obtained by different population’s operators

Let λ be the number of new solutions created in each generation and µ the initial population size. We have considered the following strategies: 105 • Lower (λ < µ) where λ = 0.5 · population size and nb recombinations = 0.3 · population size and nb mutations = 0.2 · population size. • Equal (λ = µ) where λ = population size and nb recombinations = 0.6 · population size and nb mutations = 0.4 · population size • Greater (λ > µ) where λ = 1.4 · population size and nb recombinations = 0.8 · population size and nb mutations = 0.6 · population size. These strategies were evaluated for population sizes 30, 50 and 80. The value of the rest of parameters is given in Table 8. Table 8. Values of parameters for tuning the addition policy nb generations (max 90s) nb solutions to recombine 3 start choice StartLJFRSJFR select choice Binary Tournament recombine choice One point recombination recombine selection N Tournament mutate choice Rebalance mutate selection Best Selection local search choice LMCTS nb local search iterations 15 nb ls not improving iterations 5

Els resultats dels diferents experiments mostren una certa irregularitat respecte la política seguida que dificulta l’extracció de conclusions fermes, molt probablement degut a que els valors absoluts del nombre de mutacions i el nombre de combinacions, junt amb la mida de la població, juguin un pes mol més rellevant que no pas la relació entre el nombre de nous individus i els vells. No obstant això, tal i com es mostra en la figura 4.16, es pot observar com els millors valors de makespan solen ser obtinguts per una estratègia equal on el nombre de nous individus equival a la mida de la població inicial. Convé afegir que el temps de càlcul An Evolutionary fordeJob Scheduling on Computational Grids 41 augmenta en proporcióHeuristic al nombre nous individus que es creïn a cada generació juntament amb la mida de la població, per tant una política greater serà sempre més cara que una The equal, resultsla of the different showedlower, a certain irregularity política qual també ho seràexperiments més que una política disminuint d’aquestaremanera garding thedeaddition policy making thus difficult conclusions. el nombre generacions que es poden executar segons to es draw treballiclear amb més individus. Per aquesta raó, sempre apostar per 17, l’estratègia quemakespan obtingui lareductions major qualitat relació en el However, as cal it is shown in Fig. the best areenusually temps i noby enthe relació al nombre de generacions. obtained strategy Equal. 3155000

3150000 Lower Equal Greater

Makespan

3145000

3140000

3135000

3130000

3125000

3120000

Population 30

Population 50

Population 80

Fig.4.16 17.Valors Reduction of the makespan obtained different additionamb policies Figura de makespan obtinguts per les diferents by polítiques poblacionals el paràmetre de add_only_if_better = fals

Pel que fa al paràmetre add_only_if_better, sí que es pot afirmar que els resultats als quals Summary of final setting of parameters convergeixen les diferents configuracions solen ser millors quan aquest pren un valor de fals, tal i com exemplifica la figura 4.17.10Lathe política de of selecció de la configuració què es We summarize in Tables 9 and values parameters that haveamb been treballa és bastant estricte, deixant poc marge a l’exploració de noves regions de l’espai used for the evaluation of MA and MA+TS and compare the obtained results de cerca ja que s’intensifica en les millors solucions. El fet d’establir el paràmetre with best known results for the problem in the meta-heuristic literature. add_only_if_better a fals ajuda a compensar aquest fenomen al admetre temporalment nous individus pitjors, els quals poden conduir a altres regions de l’espai de cerca on s’obtinguin valors de makespan millors.of parameters used in MA evaluation Tablemolt 9. Values nb generations (max 90s) population size 50 nb solutions to recombine 3 nb recombinations 10 nb mutations 40 start choice MCT and LJFR-SJFR select choice Random Selection recombine choice One point recombination recombine selection Binary Tournament 107 mutate choice Rebalance mutate selection Best Selection local search choice LMCTS nb local search iterations 10 nb ls not improving iterations 4 add only if better false

42

Fatos Xhafa Table 10. Values of parameters used in MA+TS evaluation nb generations (max 90s) population size 65 nb solutions to recombine 3 nb recombinations 0.2·population size nb mutations 0.8·population size start choice MCT and LJFR-SJFR select choice Random Selection recombine choice One point recombination recombine selection Binary Tournament mutate choice Rebalance mutate selection Best Selection local search choice LTH nb local search iterations 1 nb ls not improving iterations ∞ add only if better false

8.2 Computational results: MA evaluation In this section we give the evaluation of the MA and MA+TS using the benchmark instances of Braun et al. [6] by comparing6 with results of GA implementation of Braun et al. and the GA implemented by Carretero & Xhafa [9]. These instances are classified into 12 different types of ET C matrices, each of them consisting of 100 instances, according to three metrics: task heterogeneity, machine heterogeneity and consistency. The notation u x yyzz.0 reads as follows: - u means uniform distribution (used in generating the matrix). - x means the type of inconsistency (c–consistent, i–inconsistent and s means semi-consistent). - yy indicates the heterogeneity of the tasks (hi means high, and lo means low). - zz indicates the heterogeneity of the resources (hi means high, and lo means low). It should be noted that the benchmark of these instances is considered as the most difficult one for the scheduling problem in heterogeneous environments, and, is the main reference in the literature. Note that for all the instances, the number of tasks is 512 and the number of machines is 16. Again, the executions have been carried out in the machines AMD K6(tm) 3D, 450 MHz and 256 Mb with the limited search time to 90 seconds (similar to Braun et al.); the results for the makespan7 , summarized in Table 11, are averaged over 10 executions for any instance. 6

7

We have not considered the comparison with Ritchie’s ACO + TS, though our implementations improve it for many instances, since it reports the results for an execution time of 12792 sec. –more than 3,5 hours–which is incomparable to our execution time of 90 sec. and tabu searches were run for 1,000,000 iterations. Braun et al. [6] report only makespan values.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

43

Table 11. Makespan values obtained by Braun’s GA, MA and MA+TS Instance u c hihi.0 u c hilo.0 u c lohi.0 u c lolo.0 u i hihi.0 u i hilo.0 u i lohi.0 u i lolo.0 u s hihi.0 u s hilo.0 u s lohi.0 u s lolo.0

GA Braun GA Carretero&Xhafa 8050844,5 7610176,737 156249,2 155251.196 258756,77 248466.775 5272,25 5226.972 3104762,5 3077705.818 75816,13 75924.023 107500,72 106069.101 2614,39 2613.110 4566206 4359312.628 98519,4 98334.640 130616,53 127641.889 3583,44 3515.526

MA 7669920,455 154631,167 249950,882 5213,076 3058785,6 74939,851 107038,839 2598,441 4327249,706 97804,756 127648,919 3510,017

MA + TS 7530020,18 153917,167 245288,936 5173,722 3058474,9 75108,495 105808,582 2596,569 4321015,44 97177,291 127633,024 3484,077

As can be seen from this table, MA + TS obtains the best makespan value for all the instances that have been used, outperforming the two GAs and the MA (using LMCTS, the best local search procedure). On the other hand, MA performs better than Braun et al. GA for all instances and better than Carretero & Xhafa GA for 75% of instances. These results show that MA + TS is a very good alternative for solving Job Scheduling in Computational Grids where more priority is given to the reduction of makespan. Regarding the flowtime, Braun et al. do not optimize the flowtime value; hence we compare in Table 12 the values obtained by the two versions of the MA and the GA by Carretero & Xhafa. Surprisingly, the non hybrid version of the MA is able to obtain the most reduced flowtime values for the great majority of instances outperforming the MA + TS hybridization. These results show that MA is a very good alternative for solving Job Scheduling in Computational Grids where more priority is given to the reduction of flowtime. Also note that both MA implementations perform better than Carretero & Xhafa GA for all considered instances.

9 Job Scheduling in a dynamic setting Unlike traditional scheduling in which benchmarks of static instances are used for experimenting, tuning of parameters as well as for comparison studies, the Grid Scheduling is dynamic. Clearly, any grid scheduler could as well be used in traditional computing environments. Therefore, static instances of Braun et al. were an useful source for an initial experimental study of the MA implementations. However, the ultimate goal is the grid scheduler be efficient and effective in a realistic grid environment, thus, the experimenting should be done in such environments. For this to be done, one has basically two alternatives: either use a real grid environment or use any simulation package that

44

Fatos Xhafa

Table 12. Flowtime values obtained by Carretero & Xhafa GA, MA and MA+TS Instance Carretero & Xhafa GA MA MA + TS u c hihi.0 1073774996,08 1040368372 1046309158 u c hilo.0 28314677,91 27544564,6 27659089,9 u c lohi.0 35677170,78 34578401,7 34787262,8 u c lolo.0 942076,61 915435,213 920222,33 u i hihi.0 376800875,08 359702293 368332234 u i hilo.0 12902561,31 12608201,9 12757607,2 u i lohi.0 13108727,88 12622347,8 12912987,9 u i lolo.0 453399,32 439215,411 444764,936 u s hihi.0 541570911,12 514981326 532319945 u s hilo.0 17007775,22 16380845,3 16616505,4 u s lohi.0 15992229,82 15174875,4 15743720 u s lolo.0 616542,78 597062,233 604519,127

simulates as much as possible a real grid environment. The first alternative seems not appropriate at the present time due to its cost and lack of flexibility in changing the configuration of grid nodes. Hence, we have considered the second alternative, that is, using a grid simulation package. Unfortunately, and to the best of our knowledge, the existing simulation packages in the literature cannot be used in a straightforward way for experimenting meta-heuristic implementations. We have developed a grid simulator using HyperSim open source package [11], which is intended to act as an application grid resource broker. It will be informed about the system dynamics, to dynamically schedule and adapt to the environment. It should be mentioned that while experimenting of meta-heuristics is known to be a tedious and complex task, it is even more complicated in a dynamic environment. Traditionally, meta-heuristics performance is experimentally studied using benchmarks of static instances. How to study the performance of a meta-heuristic in a dynamic environment is a key issue, which requires proper experimenting criteria and models. Indeed, unlike traditional experimenting with static instances, in a dynamic setting the scheduler will run as long as exist the grid system, therefore, the concept of running the scheduler for a certain amount of time is not anymore valid. Such models for experimenting of meta-heuristics in dynamic environments need the use of stochastic models that would permit to study the scheduler behavior during (long) periods of time and conclude about its performance. Given these observations, we have used the simulator to generate several grid scenarios that try to capture the dynamics of the grid. Moreover, our scenarios reflect also the large-scale characteristics of the grid. Grid scenarios The grid scenarios are obtained using the configuration given in Fig. 18. Four scenarios, according to the grid size (small: 32 hosts and 512 tasks; average:

7.6 Comparació experimental dels Algoritmes Evolutius en entorns dinàmics

An Evolutionary Heuristic for Job Scheduling on Computational Grids 45 Un cop adaptats els EAs en el simulador ja es poden comparar en entorns dinàmics simulats. Els escenaris utilitzats coincideixen amb els de [9], encara que els resultats obtinguts estan 64 hostslluny andde 1024 tasks; large: tasks; and very 256 poder ser comparats degut128 a la hosts diferènciaand entre2048 la capacitat computacional de large: les hosts and 2096utilitzades tasks).(a [9] s’empra un equip de 3.5GHz mentre que els recursos utilitzats en màquines aquest estudi estan limitats per màquines a 450 MHz). Init. hosts Max. hosts Min. hosts Mips Add host Delete host Total tasks Init. tasks Workload Interrarival Activation Reschedule Host select Task select Local policy Number runs

• • • • • • • • • • • • • • • •

Small 32 37 27

Medium 64 70 58

Large 128 135 121

N(1000, 175) N(562500,84375) N(500000,75000) N(625000,93750) 512 1024 2048 384 768 1536 N(250000000, 43750000) E(7812.5) E(3906.25) E(1953.125) resource_and_time_interval(250000) true all all sptf 15 Taula 7.10 Descripció dels escenaris dinàmics

N(625000,93750)

Very Large 256 264 248 N(437500,65625) 4096 3072 E(976.5625)

Fig. 18. Configuration used for generating four grid scenarios En la taula 7.10 es mostren els escenaris dinàmics emprats per a comparar l’eficàcia dels diferents EAs estudiats. S’han considerat quatre escenaris segons la seva magnitud: Small, Largeofi Very Per cadascunin d’ells mostren els valors que especificats The Medium, meaning theLarge. parameters thees configuration iss’han as follows. per cada paràmetre del simulador:

Init. hosts: number of resources initially in the environment. Max. hosts: maximum number of resources in the grid system. - Init. hosts: nombre de recursos hi haurà in inicialment l’entorn. Min. hosts: minimum number of the que resources the gridensystem. Mips: normal distribution modelling computing capacity of resources. Add host: distribution modelling time interval resources are added to the system. - normal Max. hosts: nombre màxim de recursos que hi podennew haver en el sistema. Delete host: normal distribution modelling time interval resources are dropped from the system. - Min. hosts: nombre mínim descheduled. recursos que hi poden haver en el sistema. Total tasks: number of tasks to be Init. tasks: initial number of tasks in the system to be scheduled. Workload: distribution modelling the de workload of dels tasks. - normal Mips: distribució que segueix la capacitat computació diferents recursos. Interarrival: exponential distribution modelling the time interval of arrivals of tasks to the system (it is assured that each time the simulator is activated, there will be at least one new Add host: distribució que segueix l’interval de temps entre que dos recursos es donen task per -resource). Activation:d’alta establishes the activation policy according to an exponential distribution. en el sistema. Reschedule: when the scheduler is activated, this parameter indicates whether the already assigned tasks, which has not yet started their execution, will be rescheduled. - Deleteselection host: distribució segueix l’interval de temps que dos recursos Host selection: policy que of resources (all means thatentre all resources of the es system are de baixapurposes). en el sistema. selected for donen scheduling Task selection: selection policy of tasks. (all means that all tasks in the system must be scheduled). - Total tasks: nombre total de tasques a planificar. Local policy: the policy of scheduling tasks to resources. One such policy is SPTF (Shortest Processing Time First Policy), that is, in each resource will be executed first the task of Init. tasks: nombre de tasquesthis inicial ja existents en el sistema i per planificar. smallest -completion time. Clearly, policy minimizes the flowtime of the resource. Number runs: number of simulations done with the same parameters. Reported results are then averaged over this number.

207

Computational results for different grid scenarios Once established the configuration for generating the grid scenarios, we have connected the MA implementations to the simulator and have measured the

46

Fatos Xhafa

makespan and flowtime values. Again we have used the same machines of Mallba cluster and have limited the search time to at most 90sec. We give in Tables 13, 14, 15 and 16 the mean values of the makespan and flowtime together with the confidence interval (at usual 95% level). As can be seen from these tables, MA+TS again obtains better makespan results than MA but worse results for flowtime. Table 13. Values of makespan and confidence interval for small size grid scenario Makespan Flowtime Algorithm Value %C.I.(0.95) Best Dev.(%) Value %C.I.(0.95) Best Dev.(%) MA 4161118,81 1,47% 0,34% 1045280118,16 0,93% 0,15% MA+TS 4157307,74 1,31% 0,25% 1045797293,10 0,93% 0,20%

Table 14. Values of makespan and confidence interval for medium size grid scenario Makespan Flowtime Algorithm Value %C.I.(0.95) Best Dev.(%) Value %C.I.(0.95) Best Dev.(%) MA 4096566,76 0,94% 0,32% 2077936674,17 0,61% 0,07% MA+TS 4083956,30 0,70% 0,01% 2080903152,40 0,62% 0,22%

Table 15. Values of makespan and confidence interval for large size grid scenario Makespan Flowtime Algorithm Value %C.I.(0.95) Best Dev.(%) Value %C.I.(0.95) Best Dev.(%) MA 4074842,81 0,69% 0,29% 4146872566,09 0,54% 0,02% MA+TS 4067825,95 0,77% 0,12% 4153455636,89 0,53% 0,18%

Table 16. Values of makespan and confidence interval for very large size grid scenario Makespan Flowtime Algorithm Value %C.I.(0.95) Best Dev.(%) Value %C.I.(0.95) Best Dev.(%) MA 4140542,54 0,80% 0,82% 8328971557,96 0,35% 0,00% MA+TS 4106945,59 0,74% 0,00% 8341662800,11 0,35% 0,15%

10 Conclusions and further work In this chapter a hybrid evolutionary algorithm based on Memetic Algorithms for the problem of Job Scheduling on Computational Grids is presented.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

47

MAs have shown to perform better than other evolutionary algorithms for many optimization problems. One of the main differences of MAs and other evolutionary-like algorithms is the use of local search mechanism that is applied to the new solutions generated in each evolutionary step. In this work we have precisely exploited this characteristic: the evolutionary search could be benefited by efficient local search procedures to intensify the search in promising regions. Therefore, we have implemented a set of sixteen local search procedures (consisting in fourteen purely local search, Variable Neighborhood Search and Tabu Search). By using an algorithmic skeleton for MA we have been able to implement the hybridizations MA+local search and MA+TS algorithms. We have done a considerable effort to tune the large set of parameters of the local search procedures and those of MA and TS in order to identify the best hybridization for the problem. The experimental results show that MA+TS outperforms the rest of MA+local search in minimizing makespan values but not for flowtime values. On the other hand, both MA and MA+TS algorithms outperform the results of a GA by Braun et al. [6] and those of a GA by Carretero & Xhafa [9] for all considered instances. The MA+TS hybridization shows to be a very good option for job scheduling in grid environments given that it obtains very accentuated reductions of makespan, which is very important for grid environments. An important issue addressed in this work is the need for models of experimenting and tuning of meta-heuristics in dynamic environments such as grid systems. The fact that grid scheduler must run as long as exists the grid system, as opposed to traditional schedulers, makes necessary the development of stochastic models that would allow to study the behavior and performance of grid schedulers. We have developed a grid simulator based on HyperSim package and have used it to generate different grid scenarios that are used to evaluate the MA and MA+TS implementations. We plan to extend this work as regards the experimenting in dynamic environments using the simulator and use the MA and MA+TS schedulers as part of real grid-based applications. Also, we would like to implement other meta-heuristics and extend the comparative study of this work. Additionally, implementing and evaluating parallel versions of MAs for the problem would be very interesting given the parallel structure of MAs.

Acknowledgement I am grateful to Prof. Nath from CS Department, University of Melbourne, for sending me a copy of his paper [23].

References 1. A. Abraham, R. Buyya, and B. Nath. Nature’s heuristics for scheduling jobs on computational grids. In The 8th IEEE International Conference on Advanced

48

Fatos Xhafa

Computing and Communications (ADCOM 2000) India, 2000. 2. D. Abramson, R. Buyya, and J. Giddy. A computational economy for grid computing and its implementation in the Nimrod-G resource broker. Future Generation Computer Systems Journal, 18(8):1061–1074, 2002. 3. E. Alba, F. Almeida, M. Blesa, J. Cabeza, C. Cotta, M. D´ıaz, I. Dorta, J. Gabarr´ o, C. Le´ on, J. Luna, L. Moreno, C. Pablos, J. Petit, A. Rojas, and F. Xhafa. MALLBA: A library of skeletons for combinatorial optimisation. volume 2400 of LNCS, pages 927–932. Springer, 2002. 4. M.D. Beynon, A. Sussman, U. Catalyurek, T. Kure, and J. Saltz. Optimization for data intensive grid applications. In Third Annual International Workshop on Active Middleware Services, pages 97–106, California, 2001. 5. M.J. Blesa, P. Moscato, and F. Xhafa. A memetic algorithm for the minimum weighted k-cardinality tree subgraph problem. In J. Pinho de Sousa, editor, Metaheuristic International Conference, volume 1, pages 85–91, 2001. 6. T.D. Braun, H.J. Siegel, N. Beck, L.L. Boloni, M. Maheswaran, A.I. Reuther, J.P. Robertson, M.D. Theys, and B. Yao. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 61(6):810– 837, 2001. 7. R. Buyya. Economic-based Distributed Resource Management and Scheduling for Grid Computing. PhD thesis, Monash University, Melbourne, Australia, 2002. 8. R. Buyya, D. Abramson, and J. Giddy. Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid. In The 4th Int. Conf. on High Performance Computing, Asia-Pacific Region, China, 2000. 9. J. Carretero and F. Xhafa. Using genetic algorithms for scheduling jobs in large scale grid applications. In Workshop of the European Chapter on Metaheuristics EUME 2005, Metaheuristics and Large Scale Optimization. May 19-21, Vilnius, Lithuania, 2005. 10. H. Casanova and J. Dongarra. Netsolve: Network enabled solvers. IEEE Computational Science and Engineering, 5(3):57–67, 1998. 11. High Performance Computing and Networking Center. Hypersim library. http://hpcnc.cpe.ku.ac.th/moin/HyperSim. 12. I. Foster. What is the grid? A three point checklist. White Paper, July 2002. 13. I. Foster and C. Kesselman. The Grid - Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, 1998. 14. I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid. International Journal of Supercomputer Applications, 15(3), 2001. 15. M.R. Garey and D.S. Johnson. Computers and Intractability – A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., 1979. 16. F. Glover. Future Paths for Integer Programming and Links to Artificial Intelligence. Computers and Op. Res., 5:533–549, 1986. 17. J.P. Goux, S. Kulkarni, J. Linderoth, and M. Yoder. An enabling framework for master-worker applications on the computational grid. In 9th IEEE International Symposium on High Performance Distributed Computing (HPDC’00). IEEE Computer Society, 2000. 18. I. Hisao, Y. Tadashi, and M. Tadahiko. Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Transactions on Evolutionary Computation, 7(2):204–223, 2003.

An Evolutionary Heuristic for Job Scheduling on Computational Grids

49

19. L. Linderoth and S.J. Wright. Decomposition algorithms for stochastic programming on a computational grid. Computational Optimization and Applications (Special issue on Stochastic Programming.), 24:207–250, 2003. 20. M. Maheswaran, S. Ali, H.J. Siegel, D. Hensgen, and R.F. Freund. Dynamic mapping of a class of independent tasks onto heterogeneous computing systems. Journal of Parallel and Distributed Computing, 59(2):107–131, 1999. 21. V. Di Martino and M. Mililotti. Sub optimal scheduling in a grid using genetic algorithms. Parallel Computing, 30:553–565, 2004. 22. P. Moscato. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Technical report Report 826, Caltech Concurrent Computation Program, California Institute of Technology, USA, 1989. 23. B. Nath, S. Lim, and R. Bignall. A genetic algorithm for scheduling independent jobs on uniform machines with multiple objectives. In H. Selvaraj and B. Verma, editors, Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, pages 67–74, Australia, 1998. Kluwer Academic Publishers. 24. H.B. Newman, M.H. Ellisman, and J.A Orcutt. Data-intensive e-science frontier research. Communications of ACM, 46(11):68–77, 2003. 25. J. Page and J. Naughton. Framework for task scheduling in heterogeneous distributed computing using genetic algorithms. Artificial Intelligence Review, 24:415429, 2005. 26. C. Paniagua, F. Xhafa, S. Caball´e, and T. Daradoumis. A parallel grid-based implementation for real time processing of event log data in collaborative applications. In Parallel and Distributed Processing Techniques (PDPT2005), pages 1177–1183, Las Vegas, USA, June 2005. 27. S. Phatanapherom and V. Kachitvichyanukul. Fast simulation model for grid scheduling using hypersim. In Proceedings of the 2003 Winter Simulation Conference, New Orleans, USA, December 2003. 28. G. Ritchie. Static multi-processor scheduling with ant colony optimisation & local search. Master’s thesis, School of Informatics, University of Edinburgh, 2003. 29. G. Ritchie and J. Levine. A fast, effective local search for scheduling independent jobs in heterogeneous computing environments. Technical report, Centre for Intelligent Systems and their Applications, School of Informatics, University of Edinburgh, 2003. 30. G. Ritchie and J. Levine. A hybrid ant algorithm for scheduling independent jobs in heterogeneous computing environments. In 23rd Workshop of the UK Planning and Scheduling Special Interest Group (PLANSIG 2004), 2004. 31. A. Thesen. Design and evaluation of tabu search algorithms for multiprocessor scheduling. Journal of Heuristics, 4(2):141–160, 1998. 32. D. Whitley. Genetic Algorithms in Engineering and Computer Science, chapter Modeling Hybrid Genetic Algorithms, pages 191–201. John Wiley, 1995. 33. S.J. Wright. Solving optimization problems on computational grids. Optima, 65, 2001. 34. Min-You Wu and Wei Shu. A high-performance mapping algorithm for heterogeneous computing systems. In Proceedings of the 15th International Parallel & Distributed Processing Symposium, page 74, 2001. 35. A. YarKhan and J. Dongarra. Experiments with scheduling using simulated annealing in a grid environment. In GRID2002, pages 232–242, 2002.

50

Fatos Xhafa

36. A.Y. Zomaya and Y.H. Teh. Observations on using genetic algorithms for dynamic load-balancing. IEEE Transactions On Parallel and Distributed Systems, 12(9):899–911, 2001.