Second International Conference on Computational Intelligence, Modelling and Simulation
Load Balancing Using Enhanced Ant Algorithm in Grid Computing Husna Jamal Abdul Nasir
Ku Ruhana Ku-Mahamud
Aniza Mohamed Din
College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia. E-mail:
[email protected]
College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia. E-mail:
[email protected]
College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia. E-mail:
[email protected]
utilization, maximize throughput, minimize response time, and avoid overload. The load balancing problem is defined as Nondeterministic Polynomial (NP)-complete problem [11]. Load balancing algorithm can be classified into centralized and decentralized. In the centralized approach, one node in the grid system acts as a scheduler and makes load balancing decisions for all resources. All information from other nodes will be submitted to this node. However, in the decentralized approach, all nodes in the grid system are involved in the load balancing decision. This is very costly and difficult to obtain and maintain the dynamic state information of the whole system. In decentralized approach, only partial local information is determined to make sub-optimal decisions. Stagnation in grid computing system may occur when all jobs require or are assigned to the same optimal resources which will lead to the resources having high workload. Optimal resources always have a high CPU speed and large memory space. The algorithm should be jobs and resources independent based, and ensure that jobs are able to complete with minimum amount of computational time. Computational time is a measure of the time taken by the resource to process the job. This paper presents a global inspired centralized job scheduling mechanism in grid computing system. Section 2 describes the use of ant colony optimization (ACO) algorithms in grid computing while the proposed algorithm is discussed in Section 3. Experimental results are presented in Section 4 and the concluding remarks are highlighted in Section 5.
Abstract- Load balancing is one of the critical issues that must be considered in managing a grid computing environment. It is complicated due to the distributed and heterogeneous nature of the resources. An enhanced ant algorithm for load balancing in grid computing is proposed in this paper. The proposed algorithm will determine the best resource to be allocated to the jobs based on job characteristics and resource capacity, and at the same time to balance the entire resources. The proposed algorithm focuses on local pheromone trail update and trail limit. A grid resource table is used in this proposed technique to store all information about jobs, resources and pheromone value. The credibility of the proposed algorithm is compared with other approach and results produced showed that the algorithm can balance the load of the resources.
Keywords- Load Balancing, Grid Scheduling, Ant Algorithm I.
INTRODUCTION
Distributed system consists of multiple computers that communicate through a computer network. Research by [3] defined that cluster and grid computing are several ways for establishing distributed system. Several personal computers or workstation in cluster computing are combined through local networks in order to develop distributed applications. In cluster computing, applications are being inflexible in variation because they are limited to a fixed area. From this disadvantage, grid computing has been proposed to solve this problem. Grid computing is developed through a combination of various resources from different geographic locations. It is based on large-scale resources sharing in a widely connected network such as the Internet [6]. This makes grid computing different from conventional distributed computing and cluster computing. Load balancing is an essential function provided by the grid infrastructure. Scalability and adaptability are two main aspects that have to be considered in implementing any load balancing algorithm. Resources in grid environment are geographically distributed in a largescale way and resource performance changes from time to time. On the other hand, jobs submitted by the users require resources with different QoS requirement. An effective load balancing and scheduling technique must be defined to manage the grid computing environment. Load balancing aims to distribute workload evenly across two or more computers, CPUs, network links, hard disk, or other resources, in order to get optimal resource 978-0-7695-4262-1/10 $26.00 © 2010 IEEE DOI 10.1109/CIMSiM.2010.29
II.
ACO ALGORITHMS IN GRID ENVIRONMENT
ACO is inspired by a colony of ants that work together in foraging behavior. This behavior encouraged ants to find the shortest path between their nest and food source. Every ant will deposit a chemical substance called pheromone on the ground after they move from the nest to food sources and vice versa. Therefore, they will choose an optimal path based on the pheromone value. The path with high pheromone value is shorter than the path with low pheromone value. This behavior is the basis for a cooperative communication. There are various types of ACO algorithm such as Ant Colony System (ACS), MaxMin Ant System (MMAS), Rank-Based Ant System (RAS) and Elitist Ant System (EAS) [7]. ACO has been applied in solving many problems in scheduling such as Job Shop Problem, Open Shop 142 160
throughput with a controlled cost. The proposed scheduling algorithm increased the performance in terms of low processing time and low processing cost when applied to a grid application with a large number of jobs such as parameter sweeps application. This algorithm works effectively in minimizing the processing time and processing cost of the jobs. The simulation results of various scheduling algorithm such as modified ant algorithm and cost controlled algorithm are also compared. The result shows that this enhanced algorithm works better than the ant algorithm. By considering the processing cost, this enhanced ant algorithm is more suitable for wide use. However, this algorithm does not consider the size of the jobs which leads to appropriate assignment of jobs to resources. ACO algorithm for load balancing in distributed systems through the use of multiple ant colonies is proposed in [1]. In this algorithm, information on resources is dynamically updated at each ant movement. Load balancing system is based on multiple ant colonies information. Multiple ant colonies have been adopted such that each node will send a colored colony throughout the network. Colored ant colonies are used to prevent ants of the same nest from following the same route and also enforcing them to be distributed all over the nodes in the system and each ant acts like a mobile agent which carries newly updated load balancing information to the next nodes. This proposed algorithm has been compared with the work-stealing approach for load balancing in grid computing. Experimental result shows that multiple ant colonies work better than work-stealing algorithm in term of their efficiency. However, the multiple ant colonies do not consider resources capacity and jobs characteristics. This can make matching the jobs with the best resources a difficult task for the scheduling algorithm. The study to improve ant algorithm for job scheduling in grid computing which is based on the basic idea of ACO was proposed in [4]. The pheromone update function in this research is performed by adding encouragement, punishment coefficient and load balancing factor. The initial pheromone value of each resource is based on its status where a job is assigned to the resource with the maximum pheromone value. The strength of pheromone of each resource will be updated after completion of the job. The encouragement, punishment and local balancing factor coefficient are defined by users and are used to update pheromone values of resources. If a resource completed a job successfully, more pheromone will be added by the encouragement coefficient in order to be selected for the next job execution. If a resource failed to complete a job, it will be punished by adding less pheromone value. The load of each resource is taken into account and the balancing factor is also applied to change the pheromone value of each resource. Balanced job assignment based on ant algorithm for computing grids called BACO was proposed in [12]. The research aims to minimize the computation time of job executing in Taiwan UniGrid environment which also
Problem, Permutation Flow Shop Problem, Single Machine Total Tardiness Problem, Single Machine Total Weighted Tardiness Problem, Resource Constraints Project Scheduling Problem, Group Shop Problem and Single Machine Total Tardiness Problem with Sequence Dependent Setup Times [7]. A recent approach of ACO researches in the use of ACO for scheduling job in grid computing [13]. ACO algorithm is used in grid computing because it is easily adapted to solve both static and dynamic combinatorial optimization problems. In [14], ACO has been used as an effective algorithm in solving the load balancing problem in grid computing. The process taken by ACO will consider the pheromone value which depends on the time taken by each resource to process jobs. It does not consider the capacity of resources such as their bandwidth, processor speed and load. In [2], two distributed artificial life-inspired load balancing algorithm are introduced, which are ACO and Particle Swarm Optimization (PSO). Distributed load balancing are categorized as a robust algorithm that can adapt to any topology changes in a network. In the proposed algorithm, an ant acts as a broker to find the best node in term of the pheromone value stored in the pheromone table. The node with the lightest load is selected as the best node. The position of each node in the flock can be determined by its load in PSO. The particle will compare the load of nodes with its neighbours and will move towards the best neighbour by sending assigned jobs to it. The proposed algorithm performed better than ACO for job scheduling where jobs are being submitted from different sources and different time intervals. PSO shows better results than ACO in terms of the makespan. However, PSO uses more bandwidth and communication compared to ACO. The main drawback of Ant Colony is that jobs are not scheduled efficiently and therefore load among the resources are not balanced. This problem can be fixed by increasing the number of ants that can explore the entire grid system to find resources with the lightest load. A study in [10] proposed a new algorithm that is based on an echo intelligent system, autonomous and cooperative ants. In this proposed algorithm, the ants can procreate and also can commit suicide depending on existing condition. Ant level load balancing is proposed to improve the performance of the mechanism. Ants are created on demand during their lives adaptively to achieve the grid load balancing. The ants may bear offspring when they detect the system is drastically unbalanced and commit suicide when they detect equilibrium in the environment. The ants will care for every node visited during their steps and record node specifications for future decision making. Theoretical and simulation results indicate that this new algorithm surpasses its predecessor. However, the pheromone values were not updated in this proposed algorithm which enables the assignment of jobs to the same resource. Therefore, stagnation will occur in the grid computing system. An enhanced ant algorithm for task scheduling in grid computing was proposed in [5], which gives better 161 143
focuses on load balancing factors of each resource. By considering the resource status and the size of the given job, BACO algorithm chooses optimal resources to process the submitted jobs. The local and global pheromone update techniques are used to balance the system load. Local pheromone update function updates the status of the selected resource after a job has been assigned and the job scheduler depends on the latest information of the selected resource for the next job submission. Global pheromone update function updates the status of each resource for all jobs after the completion of the jobs. By using these two update techniques, the job scheduler will get the latest information of all resources for the next job submission. From the experimental result, BACO is capable of balancing the entire system load regardless of the size of the jobs. Based on the previous research discussed above, ACS has proven to be the most popular variant of ACO that has been successfully used in grid computing to solve scheduling problems which eventually reduces the stagnation problem. III.
in grid system. Pheromone value will be determined by two types of pheromone update technique which are local pheromone update in ACS [7] and global pheromone update in MMAS [8]. The initial pheromone value of each resource for each job is calculated based on the estimated transmission time and execution time of a given job when assigned to this resource. The estimated transmission time can be
Sj
where S j is the size of a bandwidthr given job j and bandwidthr is the bandwidth available determined by
between the grid resource broker and the resource. The initial pheromone value is defined by: −1 Sj Cj (1) PVrj = + bandwidthr MIPSr * (1 − load r ) where PVij is the pheromone value for job j assigned to resource r, C j is the CPU time needed of job j,
MIPSr is the processor speed of resource r and 1 − load is the current load of resource r. The load,
PROPOSED ENHANCED ACO FOR GRID LOAD BALANCING
processor speed and bandwidth can be obtained from grid information server. Assume that there are n jobs and m resources in the PV matrix as shown below:
The proposed enhanced ant algorithm (Eant) takes into consideration the processor speed of the resources and the characteristics of jobs in determining the best resource to process a job. This is different from the approach in [2] which proposed AntZ algorithm that did not consider the characteristics of jobs and the current load of each resource during the scheduling process. Eant technique selects the resources based on the pheromone value on each resource which is recorded in a matrix. This technique has been implemented in the grid system architecture and consists of four main components namely the grid information server, grid resource broker, jobs and resources. The technique works as follows: 1. User will send request to process a job. Details about the job such as the total number of jobs, size of each job, and CPU time needed by jobs will be included in the request. 2. Grid resource broker starts to calculate the relevant parameter to schedule the job after receiving the message from the user. The information server also provides the resource information to grid resource broker. 3. The largest entry in the pheromone value (PV) matrix will be selected by proposed technique as the resource to process the submitted job. A local pheromone update is performed after a job is assigned to a resource. 4. A global pheromone update is performed after a resource completed processing a job. 5. The execution results will be sent to the user. In this proposed technique, an ant represents a job in the grid system. The grid resource broker, which is an intelligent agent, will find available resources from grid information server. Ant will move randomly in grid system and check the status of each resource. Pheromone value on a resource indicates the capacity of each resource
j1 r1 PV = r2 .. rm
j2
PV11 .. .. PVm1
..
PV12 .. .. PVm 2
jn
.. PV1n .. .. .. .. .. PVmn
The largest entry from PV matrix will be selected in each iteration. Assuming PVij is selected then job j will be processed by resource r. The local pheromone update is performed after job j has been assigned to resource r. This formula only applied to unassigned jobs in the PV matrix. The local pheromone update is formulated as follow:
PVrj = (1 − ξ ).τ jr + ξ .τ 0
where ξ,0< ξ