Engineering with Computers DOI 10.1007/s00366-014-0364-z
ORIGINAL ARTICLE
Variable threshold-based hierarchical load balancing technique in Grid Neeraj Rathore • Inderveer Chana
Received: 8 January 2014 / Accepted: 17 April 2014 Springer-Verlag London 2014
Abstract Load balancing is an important aspect of Grid resource scheduling. This paper attempts to address the issue of load balancing in a Grid, while maintaining the resource utilization and response time for dynamic and decentralized Grid environment. Here, to its optimum value, a hierarchical load balancing technique has been analysed based on variable threshold value. The load is divided into different categories, such as lightly loaded, under-lightly loaded, overloaded, and normally loaded. A threshold value, which can be found out using load deviation, is responsible for transferring the task and flow of workload information. In order to improve response time and to increase throughput of the Grid, a random policy has been introduced to reduce the resource allocation capacity. The proposed model has been rigorously examined over the GridSim simulator using various parameters, such as response time, resource allocation efficiency, etc. Experimental results prove the superiority of the proposed technique over existing techniques, such as without load balancing, load balancing in enhanced GridSim. Keywords Grid computing GridSim Architecture Load balancing Workload Job migration Random
N. Rathore (&) Department of Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India e-mail:
[email protected] I. Chana Department of Computer Science and Engineering, Thapar University, Patiala, Punjab, India e-mail:
[email protected]
1 Introduction The outburst of the Internet, along with the increase in the computational power of the servers and emergence of highspeed computer networks has given the power to the Grid to share computational resources [1]. Grid is a vastly distributed enormous system that builds a global network of computational resources and may span many continents. The use of each small resource is flexible that combines computer resources and information systems for creating a ubiquitous source of computational power [2, 3]. Sharing resources between organizational and institutional boundaries needs an infrastructure to coordinate the resources across the boundary, so-called virtual organizations [3]. The Grid application performance is often degraded due to numerous challenges, such as fault tolerance, security, heterogeneity of resource, load balancing, etc. Load balancing is one of the critical features that ensures the load is balanced on each node in the network. Load balancing is performed into four basic steps [4, 6], namely as load monitoring, synchronization, re-balancing criteria, job migration, as shown in Fig. 1. In load balancing problem, resources on one Grid site may be heavily loaded while on another Grid site may be lightly loaded or idle. This may be the reason for unequal computing and patchy template of task landing. Therefore, it is desirable to transfer some jobs from the heavily loaded resources to the lightly loaded, which will to reduce mean job response time and increase the performance of each host while ensuring that small jobs do not suffer from starvation [5]. 1.1 Motivation In this paper, the design and development of an optimal load balancing algorithm have been proposed for the Grid. Due to the materialization of Grid computing over the Internet,
123
Engineering with Computers
Start
Load Monitoring
Synchronization
Rebalancing Criteria
Job Migration
End
Fig. 1 Basic load balancing steps Fig. 2 Basic difference between existing and proposed technique
Choosing Node for Load Balance / Number of Comparisons
Lightly Loaded Pool
Normally Loaded Pool
Heavily Loaded Pool
Existing Technique
Proposed Technique Under Lightly Loaded Pool
Lightly Loaded Pool
Normally Loaded Pool
Heavily Loaded Pool
Choosing Node for Load Balance / Number of Comparisons
the need for the hierarchical load balancing algorithm is in demand [6]. Response time and resource allocation have been recorded as a fair contribution of this research. In the existing scenario, dynamic, decentralized load balancing considers all the factors pertaining to the characteristics of the Grid computing environment. To overcome these problems, a dynamic threshold value is used at each level in proposed technique. The value of the threshold is dynamically changing according to the Grid size in the network. Furthermore, it extends the existing technique of dividing lightly loaded node into lightly and under-lightly loaded categories in the context of variable threshold and standard deviation. This approach minimizes the searching time to find out the receiver machine in order to transfer the Gridlets. A well-designed information exchange scheduling scheme is adopted is the proposed technique to enhance the efficiency of the load balancing model. A comparative analysis shows difference between the techniques and exhibits why the proposed algorithm is better than any other existing algorithms presented in Fig. 2 and Table 1. The proposed technique is a new version of the load balancing mechanism using random policy [7]. The random load balancing policy chooses the target endpoint randomly from the specified list. The proposed load balancing technique has not only increased resource allocation efficiency of Grid resources, but also cut down the response time of the entire Grid. This algorithm is
123
rigorously examined on the GridSim simulator to judge the effectiveness of the algorithm, especially on Grid platform. The simulation results shows significant advantages of proposed algorithm over other existing approaches, such as centralized load balancing, no load balancing, etc. 1.2 Contribution Due to the appearance of load balancing challenge in Grid computing, there is currently a need for a hierarchical load balancing algorithm which takes into the account the Grid architecture, heterogeneity, response time, and resource allocation. Considering all the necessary factors, the main contributions and novelties of the current research work are as follows: 1.
2.
This is a novel technique to tackle hierarchical load balancing that takes into account all the factors pertaining to the characteristics of the Grid computing environment. This technique is provides a good backup and support, in case of server failure. The proposed approach eliminates the scalability complexity of a site. Most of the existing load balancing techniques uses a static threshold value that is good for only restricted network size, but cause problems in a large-scale Grid environment. It may lead to scalability, response time issues, etc. To overcome these problems, a dynamic threshold value
Engineering with Computers Table 1 Comparison between existing and proposed technique Techniques
Existing approach
Our approach (PLBA)
Categories
Divided into three pools, such as: lightly, normally, heavily
Divided into four pools, such as: under-lightly, lightly, normally, heavily
Techniques
Each heavily loaded node has to check all the nodes in the lightly loaded pool
According to load percentage, we have to check overloaded node into one pool either lightly or under-lightly loaded pool because of sorting
Comparisons
1
2
Scheduling
No
Yes
Sorting
Not used
Yes, it is used in the algorithm Ascending—under-lightly and lightly loaded pool Descending—heavily loaded pool Random policy—it means we can do our process vice versa also. (lightly loaded node can also search heavily loaded node)
Policy
Not applicable
Response time
Low
High
Throughput
Low
High
Resource allocation capacity
Low
High (due to idea of threshold and load deviation)
Cost
Low
Medium
Comparison time
High (each node has explored in the pool)
Low (due to sorting technique)
3.
4.
is used here. Every time when the application or node has been increased or decreased, the threshold value has to be re-calculated. This is easy to use and more supportive with the hierarchical load balancing technique. An efficient load balancing technique is adopted to enhance the efficiency of the load balancing model. An extended version of the random policy [8] has been proposed for selecting the node during load balancing and initiated by under-loaded or overloaded node in the pool of shared nodes. Qureshi and Rehman [20] divided the load balancing category into three parts. In the current research, the lightly loaded node category is divided into underlightly loaded and lightly loaded node in order to increase the response time.
The remainder of this paper is organized as follows: related work is discussed in the Sect. 2. Section 3 describes the structure of the load balancing model along with the
design of the system environment and its application to the network environment. Section 4 focuses on the system model and problem definitions in which implementation of algorithms and analysis of mathematical calculations are introduced. Performance evaluations and simulation results are presented in Sect. 5. Finally, the last section is dedicated to conclusion and future work.
2 Related work The Grid is a distributed network in which the resources are continuously added and deleted according to the requirement [3]. Due to the above reason, a hierarchical load balancing technique is used for dynamic and distributed Grid environment. Several authors have proposed various load balancing techniques in Grid environments. Yagoubi and Slimani [9] proposed a layered algorithm based on a tree model, which achieved dynamic load balancing in Grid computing. It supports heterogeneity, scalability, and total independence of any physical architecture of a Grid, but lacks in response time. Erdil and Lewis [10] have described an information dissemination protocol that can distribute the load in a way without using load re-balancing through job migration. But it is more complex and costly in large-scale heterogeneous Grids. The model proposed by Erdil and Lewis [10] mainly deals with nodes adjustment, advertising rates and aggressiveness to influence scheduling, but the major problem with this model is the scalability, resource failure and robustness. Azin and Simone et al. [8] use an Ant colony Optimization approach using state broadcast and random methods. It invokes a response to submit a job to the Grid, and surfs the network to find the best resource to deliver the job. It has issues, such as scalability and resource failure. Qingqi et al. [11] integrates the distributed mode, approximate optimization and Agent Set Scheduling approach to achieve short execution time in large-scale agent scheduling. It describes the multi-level framework, agent distribution, communication, architecture, event-driven conservative time synchronization mechanism to avoid potential failure problem caused by the centralized mode as well as improve execution efficiency through parallel computing, but lacks in migration cost, security and scheduling. Zheng et al. [12] overcome the scalability challenges and considerably reduce the memory requirements, the running time and provide interconnect topologyaware strategies that map the communication graph on the processor topology to minimize network contention. Nandagopal et al. [13] address the problem of load balancing using Min-Load and Min-Cost policies while scheduling jobs to the resources in the multi-cluster environment. They heuristically estimate the completion time of executing
123
Engineering with Computers
jobs in the remote clusters, improved system performance in terms of mean response time and average slowdown, but lacked precedence constraint among different tasks of a job. Robson et al. [14] addressed the problem of monitor resources, load reallocation and load migration. It also minimized balancing overhead while load redistribution. Balancing involves normalizing load distribution of shared resources and maximizing overall resource utilization. Migration technique provided reliably and low latency load transfers, but need better detection and reactivity in the model. Deepti et al. [15], able to trace dead machines, framework works well with heterogeneous web servers for high incoming load and low response is time, but the bottleneck of the framework is centralized main server controller. Nandagopal et al. [16] have reported a highly decentralized, distributed and scalable algorithm for scheduling jobs and balance the load across the resources in heterogeneous computational Grid environment to minimize response time and to reduce the communication overhead. Zheng et al. [17] considerably reduce the memory requirements, the running time and provide interconnect topology-aware strategies that map the communication graph on the processor topology to minimize network contention, but faced the challenges in the scalability of hierarchical LBA. Comparison of different load balancing technique in [18] clearly shows that Tree (Hierarchical)-based load balancing technique is better than others. In traditional distributed systems, Grid architecture has the following properties that characterize, such as heterogeneity, scalability, adaptability, hierarchical, independent, etc. These properties make the load balancing problem more traditional parallel and distributed systems, which offer homogeneity and stability of their resources. Also, interconnected networks on grids have very disparate performances and tasks submitted to the system that can be very diversified and irregular. These various observations show that it is very difficult to define a load balancing system which can integrate all these factors (Table 1). On the basis of existing load balancing techniques, all the fundamental problems, such as response time, scalability, etc., are improved by using the proposed efficient hierarchical load balancing technique because of its layered approach and dynamic load calculation criteria, shown in Table 2. The load balancing technique reported by literature [19] can be categorized into lightly loaded, overloaded, normally loaded. If load on one node exceeds or reduces or equals to the threshold value then it may be treated as overloaded, lightly loaded and normally loaded at each level, respectively. Load parameters shown in Table 3 use previously tested simulated work [13, 20], which was based on the same hierarchical technique. Existing techniques use the static threshold value at each level which requires
123
network scalability. Further, if the numbers of nodes are increased in the network, then response time and resource allocation efficiency are proportionally increased. 2.1 Existing referenced algorithms compared with proposed algorithm In the literature, many load balancing algorithms have been proposed. Most of them can be applied to the Grid environment with suitable modifications. In Without Load Balancing (WLB), no load balancing has been done, while Load Balancing on GridSim, load balancing is performed, but only at the broker level. Load Balancing on Enhanced GridSim (LBGES) [20], the load balancing is performed at broker, resource, and machine levels. The Min–min algorithm [21, 22] sets the jobs that can be completed earliest with the highest priority. Each job will always be assigned to the resource that can complete it earliest. Similar to Min– min algorithm, Max–min algorithm [23, 24] sets the highest priority to the job with the maximum earliest completion time. The main idea of Max–min algorithm is to overlap long running tasks with short-running tasks. Max–min can be used in cases where there are many shorter tasks than there are longer tasks. For example, if there is only one long task, Min–min will first execute many short jobs concurrently, and then execute the long task. The Max–min will execute short jobs concurrently with the longest job.
3 Preliminaries Qureshi and Rehman [20] had documented Grid architecture for load balancing, as illustrated in Fig. 3. Poisson process has been used for random job arrival with a random computation length [25]. Considering that the jobs are sequenced, mutually independent with the arrival rate and can be executed on any site. Furthermore, the site should meet the job demand for computing resource and the amount of data transmitted. Each processor can only execute one job at a time and execution of a job cannot be interrupted or moved to another process during current execution. This model has been divided into three levels: Level-0 Broker, Level-1 Resource, and Level-2 Machine Level. These entities can have customized characteristics. When a new Gridlet arrives on a machine, it may go to under-lightly loaded, lightly loaded, overloaded and normal loaded resources by load calculation being computed at each node. In order to compute the mean job response time analysis one Grid Broker (GB) section as a simplified Grid model has been considered. Grid Broker is the top manager of a Grid environment. It is liable for sustaining the overall Grid activities of scheduling and rescheduling. It acquires the information of the work load from Grid resources and
Engineering with Computers Table 2 Comparisons of load balancing techniques [8–18] LB technique/ parameters
Prediction based approach
Agent-based approach
Supportive node approach
Recitation (policy and strategy) based
Hierarchical approach
Environment
Heterogeneous (centralized load manager)
Distributed heterogeneous P2P
Distributed heterogeneous
Centralized and sender initiate
Distributed heterogeneous
Aim
Effective resources utilization and reduce average job response time
Self-adaptive, avoiding large amount of data transfer and find under load nodes more quickly
Reduces communication delay using supporting nodes
Reduce response time
Nodes at different levels of hierarchy communicate with the nodes below them to get information about the network performance
Strength
Better resource utilization
Migrate load on the basis of state information exchange and location policy
Minimum traffic due to attached central node at each cluster
Based on load balancing policy
Load index as a decisive factor for scheduling tasks and balanced load with the neighbour cluster
Criteria/ issues to be addressed
The selected task basis of I/Ointensive, CPUintensive and memory intensive
Load estimates based on the job queue length and job arrival rates
Primary and centralized approach
Information, triggering and Selection Policies
Threshold policies, information exchange criteria, selection of nodes at different levels of network, failure intensity, performance, migration time
Compose in module/ matrices
1. Predictor
1. Job scheduler
Grid level
2. Communication layer
3. Scheduler
Resource managers
Provide support while migrating load in the individual node pool
Information policy
2. Selector
Behaviour
Cluster-based queuing model
Analyzed by varying the number of jobs in a nest
Gap
Need to evaluate large-scale cluster
Number of nodes increases in a nest will cause to increase the complexity of the program
Table 3 Load parameter [13, 20]
Triggering policy
Cluster level
Selection Policy
Leaf nodes
Performance criteria are load index and queue length
Based on CPU consumption and queue length
Load index based on CPU utilization, queue length, scalable
Complexity in implementation
Resources submit and withdrawn in any movement (affect performance)
Less fault tolerance
Level
Load parameter
PE level threshold Machine-level threshold
q = 0.6 g = 0.75
Resource-level threshold
q = 0.8
sends the tasks to resources for optimization of load. Resource that comes next to Grid Broker in the hierarchy, are connected through internet. The resource manager is responsible for maintaining the scheduling and load balancing of its machines and it also sends an event to the Grid broker during overload [26]. The machine is a Processing Entity (PE) manager, responsible for task scheduling and load balancing of its PEs connected with various resources via LAN. Processing Entity (PE) manager also sends an event to resource during overload. PEs next to machines are
mainly responsible for calculating workload and threshold values for load balancing, job migration and passes the load information upwards to machines via buses. Gridlet is considered as a load and assigned to any of the PEs according to their capability (i.e. computational speed, queue length, etc.). The proposed hierarchical-based Load Balancing technique in a heterogeneous Grid environment, assumed the heterogeneity in terms of processing capability and the same is applicable to the PEs. 3.1 Analytical view of local Grid manager To get inside the model analytically, a Local Grid Manager (LGM) section is considered as a simplified grid model, in which the external tasks arrive sequentially at the LGM according to a time-invariant Poisson process, with interarrival times which are independent, identically, and exponentially distributed. Simultaneous arrivals are excluded [27]. Considering the following system parameters:
123
Engineering with Computers
Level-0 Grid Broker Grid Broker
Internet
Level-1 Resource Managers Resource Manager 1
Resource Manager M
LAN
Resource Manager N
LAN
LAN
Level-2 PE Managers
Node 11
Node 1N
Node M1
Node MN
Node N1
Node NN
Bus Topology
Clusters of PE`s
Gridlets 1…..n
Gridlets 1…..n
Gridlets 1…..n
Gridlets 1…..n
Fig. 3 Hierarchical Load Balancing Model [28]
k is external (remote) task arrival rate from grid clients and other LGMs to an LGM out of which, the rate ka is accepted for processing by the LGM, and the rate kr is rejected up arrival and forwarded upon arrival through the communication means to another LGM. Hence, k = ka ? kr. kai is the external task flow rate from the LGM to the ith SM which is managed by that LGM. kLi and is the local task arrival rate to the ith Site Manager (SM). kL is the total local tasks arrival rate to the SMs managed by a LGM, hence P L kL ¼ m 1¼1 ki and i = 1,2,3…,m is the number of sites managed by that LGM. ki is the total arrival rate (external and local) to the ith SM that is ki ¼ kai þ kLi : kij is the job flow rate from the ith SM to the jth PE managed by that SM. l is the LGM service rate.
123
li is service rate of the ith SM. lij is the service rate of the jth PE which is managed by ith SM. Hence, the LGM traffic intensity is given by P P k þ kL k þ m kLi kþ m kLi i¼1 q¼ ¼ Pm ¼ Pm Pi¼1 ; n l i¼1 li i¼1 j¼1 lij where i = 1m is the SMs number, and j = 1n is the number of PEs in every SM. For the LGM to be stable q must be \1. The ith SM traffic intensity is given by, q¼
ki kai þ XiL ¼ Pn : li j¼1 lij
ð1Þ
where j = 1n is the number of PEs at the ith SM. For the SM to be stable qi must be \1.
Engineering with Computers
The traffic intensity of the jth PE which is managed by ith SM qij = kij/lij, similarly qij must be \1 for that PE to be stable. During the analysis process, the following facts are used [27]: •
•
Merging of k Poisson streams with mean rate ki results in a Poisson stream with mean rate k given by Xk k¼ k i¼1 i If a Poisson stream is split into k substreams such that the probability of a task going to the ith substream is pi, each substream is also Poisson with a mean rate of pi k.
Since the communication links between the LGM and its SMs are heterogeneous, i.e. every communication link has an individual bandwidth and latency, the G (ka) is used as a general function for the overall mean expected communication (including queuing) time of forwarding external accepted task flow rate ka from the LGM to its SMs. Refer to the length of time between the instant when a task arrives at the LGM and the instant when it leaves, after all processing and communication, if any, are over as task response time. Hence, the overall mean task response time T can be computed as follows: T ¼ G ðka Þ þ TS for external tasks T ¼ TS for local tasks,
1 Xð1 qi Þ2
E½NiSM ð3Þ
To compute the expected mean task response time, Little’s formula will be used. Let E½T iSM denote the mean time spent by a task at the ith SM to the arrival rate ki and denote the mean number of tasks at that site. Hence by Little’s formula, E½T iSM can be computed as follows: 1 E½TiSM ¼ ðE½NiSM Þ: k
ð4Þ
So, the overall mean task response time all over the sites at an LGM is then given by: m 1X TS ¼ E½TiSM ; ð5Þ m i¼1 where m is the number of sites managed by that LGM. By substituting the value of TS in Eq. (1), the overall mean task (external and local) response time can be computed. 3.2 System framework
ki ka þ kL ¼ Pi n i ; n is the number of PEs in that site: li j¼1 lij
E½NiSM ¼
n Q ð1 þ 2qi Þ lij 1 j¼1 þ X ¼ Pn ki j¼1 ðki þ lij Þ 1 qi 1 ¼ : Xð1 qi Þ2
ð2Þ
where TS is the overall mean task response time all over the sites managed by the LGM. As mentioned earlier, the system is studied in the steady state that is the traffic intensity is \1, i.e. q \ 1. To compute, as in [27], the SM is modelled as a variant M/M/n queue where the service rates of any two PE are not identical which is similar to the case of heterogeneous multiprocessor system. Figure 4 shows an LGM Queuing Model, the queuing structure of a site is a part of this figure. Define the state of the SM by (k1, k2,…,kn), where ki is the number of tasks at the ith PE queue, i = 1, 2,…,n. Reorder the PEs in a list such that the occupation ratio k1/li1 B k2/ li2 B ,…, B kn/lin. The LS at the SM services the arriving tasks (local or remote) based on their arrival time (FCFS). It distributes tasks on the fastest PE (i.e. the PE has the lowest occupation ratio). The traffic intensity of ith SM is given by: q¼
Here,
:
The mean number of tasks at the ith site is then given by [27]:
Grid topology and Grid mapping into a tree-based model has been shown by Fig. 5 with some applications and sequence diagram design. The interaction of each site by computing nodes in a network is given through the following steps: (i) in the network each site may contain multiple machines and each machine may have a single or multiple processing elements (PEs). Grid scheduler is a software component that runs on each site. It manages the system-related information (i.e. CPU utilization, remaining CPU capability, remaining memory, etc.) computing nodes and sites in order to join the Grid, provide resources, receives jobs from Grid clients, assigns them to the processors in the Grid system. (ii) It provides backup to the jobs and discovers the candidate set of nearest site based on the job requirements, resource characteristics and sends the list to load balancing decisionmaker. (iii) Decision-maker decides on the basis of the minimum Grid site, whether the job required an execution at the local site or remote site and transfers the job accordingly. (iv) Job dispatcher ships the jobs, checks site availability of the fault detector and monitors the state of sites. (v) If site failures or system degradation occurs, the fault detects or sends a failure message to fault manager. If the fault manager receives a failure message, it reschedules the job using a primary backup approach [27]. In this system, sites play the role of assisting the execution of jobs. When any site is in idle state and can provide its
123
Engineering with Computers External Arrival (λ) Rejected Rate (λr)
Load Grid Manager Acceptance Rate (λn)
Global Scheduler
λ1n
λ2n
λ1
λ2
Site 2 Manager
Local Scheduler
λ11
Local Arrivals (λLn)
Local Arrivals (λL2)
Local Arrivals (λL1)
Site 1 Manager
λnn
λ12
Site n Manager
Local Scheduler
λ21
λ1n
λ22
λn
Local Scheduler
λ2m
λn2
λn1
c
c
c
PE1 U11
PE2 U12
PEn U1n
PE1 U21
PE2 U22
λnk
PEm U2m
PE1 Un1
PE2 Un2
PEk Unk
Fig. 4 Queuing Model of Grid Manager [27]
resource, a joint message and related hardware information will be transmitted to the Grid scheduler of the nearest neighbour site. When it can no longer provide its resource, it will transmit an exit message to the Grid scheduler of the nearest neighbour site. The Grid scheduler uses a derived threshold to effectively select suitable sites. This threshold value is based on the load and demands for resource needed to execute the job and will evaluate the sites in the system. This paper proposes a hierarchical load balancing technique strong point of dynamic load balancing method over static load balancing. In static load balancing state, when a job has to be executed, it will be assigned to an appropriate site. The minimum requirement of job’s needs for sites is considered as the threshold for the table of effective sites. Sites passing this threshold are called effective sites. As a grid is composed of sites with different performances, the execution time of a job on each site is also different. In dynamic load balancing state, the system will adjust dynamically until it reaches a balance. In a Grid
123
environment, effectiveness of sites varies with time. Thus, the assignment of jobs must be adjusted dynamically in accordance with the variation of the site status.
4 System model and problem description The work, proposed here, has been derived from the work of [20], with some modification. It suggests the necessity of quantification of the load in order to achieve load balancing in computational grid. Quantification of the load is done and the objective function is derived based on the load distribution of the computational nodes. 4.1 Notations used in algorithms Notations that are used in PE, Machine and Resource level are summarized and shown in Table 4:
Engineering with Computers
Grid Client
Local Grid Scheduler
LB Decision Maker
Site Selector
Job Dispatcher
Fault Detector
Fault Manager
Remote Grid Scheduler
Grid Information Server
Remote Site Interface
1 Submit Job 2.1 Backup Job
2.2 Node Search at Nearest Site 3 Selected minimum Loaded Site 4 Local Extension
5 Dispatch job & check site availability
6 Failure Message
7 Back up Activation
8 or 6 Job Result 8 Job result
Fig. 5 Sequence diagram of components and their interaction
4.2 The fitness function All the mathematical steps use in the network model is described below. 4.2.1 Machine-level calculation When the scheduler receives a job submitted by a user, it will transfer a request to the information service in order to obtain the necessary information such as the idle PE percentage of each resource, average load of each cluster and average load of the system. Then the scheduler chooses a cluster which is the fastest average computing power (ACP). Firstly, the ACP of the cluster (ACP (PEi)) is defined as: Pn PE Speedk ð1 PEk Þ ; ð6Þ ACPðPEi Þ ¼ k¼1 n
where PE_Speedk = processing speed of processing entity k in cluster i (MIPS). PEk = current CPU utilization of the PEs in cluster i, expressed as a percentage, PE = Processing Entity (Single CPU-Processing capability of each PE is same for a single machine) and n = number of processing entity in cluster i. After that machine scheduler selects the PE which has the fastest ACP. The average load of the cluster is defined by the average load of each processing entity in cluster i. The current load of PE (Curr_Load) PE is defined as: ðCurr LoadÞPE ¼
g X
FileSize;
ð7Þ
1
where g = number of Gridlets (Application/Task) on any PE.
123
Engineering with Computers Table 4 List of notations used in the algorithm S. no.
Notations
Description
1
PE
Processing element (single CPU); processing capability of each PE is same for a single machine.
2
G
Number of Gridlets (applications/task) on any PE
3
P
Number of PEs on any machine
4
PELoad
Load of any PE = ((Curr_Load) PE - Avg_load)/Rating)
5
PEsLoad
Total load of available PE in any machine =
6
OPEList
List of PE with overloaded Gridlets
7
OP
Size of OPEList (O: Overloaded and P: PE)
8
LPEList
List of PE with lightly loaded Gridlets
9
LP
Size of LPEList (L: lightly loaded and P: PE)
10
ULPEList
List of PE with under-lightly loaded Gridlets
11 12
UP NPEList
Size of ULPEList (U: under-lightly loaded and P: PE) List of PE with normal task
13
NP
Size of NPEList (N: normal loaded and P: PE)
14
(PELoad)D
Load of processing element for lightly loaded PE list
15
(TaskLoad)L
Load of Gridlets
16
(PELoad)E
Load of processing element for under-lightly loaded PE list
17
Rating
Actual capacity of PE, i.e. CPU speed in terms of MIPS rating
18
(Curr_Load)PE
Current load of PE =//where g is the number of Gridlets
19
(Task Load)L
Load of Gridlets
20
File_size
Is a workload in bytes
21
Gridlets
An arrival task as a load
22
ALCi
Average load on each cluster (collection of PE’s)
23
ALMi
Average machine load
24
r
Standard deviation
The average Load (Avg_load) of each PE is defined as: p X ðCurr LoadÞp Avg load ¼ ; ð8Þ P 1 where p = number of PE in cluster (total load of available PE in any Machine). To measure the load of each PE (PELoadk,i), as PELoadk;i ¼
ðCurr LoadPE Avg LoadÞ Rating
ð9Þ
where PELoadk,i = load of each processing entity (PE)k in the cluster i. Rating = actual capacity of PE, i.e. CPU speed in terms of MIPS rating. Calculate average load of each cluster i (ALCi) is defined as: ALCi ¼
p X
1 PELoadk;i ; p k¼1
where m = number of clusters in the machine system. We set the average load of each cluster, ALCi to be less than the balance threshold of the machine system. Hence, this threshold called balance threshold. The threshold value of the machine level denoted as q and defined as: o ¼ ALMi þ r;
ð10Þ
ð12Þ
where r = standard deviation of the load of the system and defined as: o¼
where PELoadk,i = load of all PEs in cluster i. p = number of PE in cluster (total load of available PE in any machine).
123
The average load of the machine system (ALMi) is defined as: m 1X ALCi ; ð11Þ ALMi ¼ m i¼1
N 1X ðXi XÞ2 ; for all i; N i¼1
ð13Þ
where x = average loads of the machine system which equals ALM, xi is the average load of each cluster i which equals ALC and n is the number of clusters in the machine system.
Engineering with Computers
balance threshold of the system. Hence, a threshold is set called balance threshold. The threshold value of the machine level denoted as q defined as:
4.2.2 Resource-level calculation Firstly, the ACP of the machine is defined as: Pn Mac Speedk ð1 Mack Þ ; ACMi ¼ k¼1 n
ð14Þ
where Mac_Speedk = speed of machine k in resource i (MIPS), Mack = current CPU utilization of the k machines in resource i (in percentage), n = number of machine in resource i. The average load of the resource system (ALRi) is defined as: r 1X ALMi ; ð15Þ ALRi ¼ r k¼1 where r is the number of machines in the resource system. The average load of each ALMi, is set as less than the balance threshold of the resource system. Hence, a threshold is set called balance threshold. The threshold value of the resource level denoted as g defined as: g ¼ ALRi þ r1 ;
ð16Þ
where r1 is the standard deviation of the load of the system and defined as below: N 1X r1 ¼ ðXi XÞ2 ; for all I, N i¼1
ð17Þ
where x = average loads of the resource system which equals ALR, xi = average loads of each machine i which equals ALM, N = number of machines in the resource system. 4.2.3 Broker-level calculation Firstly, the ACP of the resource is defined as: Pn Rec Speedk ð1 Reck Þ ACRi ¼ k¼1 ; n
ð18Þ
where Rec_Speedk = speed of resource k in the broker (MIPS). Reck = current CPU utilization of resource i (in percentage). n = number of resources in broker. The average load of the Broker system (ALBi) is defined as: ALBi ¼
b 1X ALRi ; b k¼1
ð19Þ
where b = number of resources in the broker system, the average load of each resource, ALRi, is set as less than the
q ¼ ALBi þ r2 ;
ð20Þ
where r2 = standard deviation of the load of the system and defined as: r2 ¼
N 1X ðXi XÞ2 ; for all I, N i¼1
ð21Þ
where x = average load of the broker system which equals ALB. xi = average load of each resource i which equals ALR. N = number of resources in the broker system. The complexity of the hierarchical network depends on max number of nodes at any level multiply with the level of the tree. On our network, generally no. of PEs is always more than the no. of machine and resources under same broker. So, total complexity of the network is O(3P) where P = no. of PEs, or we can say O(P) because we generally eliminate constant from complexity calculation. 4.3 The model and algorithms This section deals with a load balancing model that can be classified into three levels, such as resource, machine and PE’s shown in Fig. 3. The proposed algorithm has been executed individually, according to the threshold value at each level and only PE level algorithms at level-2 considered. Initially, the load has been calculated according to Gridlet (task) arrive on each node and it is connected to the corresponding PEs. The entire tasks are compute intensive that maximizes system resources for processing large computations for simulation. Furthermore, performed load balancing operation, according to the availability of lightly loaded or under-lightly loaded node list and at the same time it passes information to associated machines. To perform load balancing, a random strategy is used for selecting tasks randomly either from overloaded or underloaded node in the pool. It can reduce the time to manage the load and improve execution time as well. A random strategy is implemented at the machine-level algorithm. At last, the job migration operation has been performed till the overload queue is not empty and same procedure has been followed at machine level and resource level. PE level load threshold, machine-level load threshold and the resourcelevel load threshold denoted by q, g, and q, respectively, in algorithms 1–4, has been found out according to the task arrival.
123
Engineering with Computers
123
Engineering with Computers
The above-mentioned algorithms propose an efficient load balancing algorithm using random policy with variable threshold value. Here, machine-level algorithms have been discussed. The same algorithm procedure has been followed at resource and PE level.
5 Experimental result and discussion Through this part of paper, firstly the configuration of the simulation setup is introduced and then the experimental results are given. Each algorithm was run five consecutive times and the average of the results was taken as the final result of that algorithm. 5.1 Experimental setup Windows 7 on an Intel i5 with 3 GB of RAM and 360 GB of hard disk has been used during simulation experiments in GridSim version 5.0 [26] with JDK 1.5 updates 6 and JRE 1.3. The goal of this experiment is to compare the
performance of a decentralized proposed load balancing algorithms (PLBA) with conventional WLB and load balancing in enhanced GridSim (LBEGS) algorithm. Grid systems are randomized in various sizes: 100, 500, 1,000, and 2,000 nodes. In the experiments we change some of test parameters, such as the size of the Grid that is denoted by S on figures, resource requests frequency with 50, 100, 200 and 400 ms are denoted by Ri. The experiment is to randomly submit 1,000 Grid requests at the starting and schedule them to the specific Grid resource based on WLB, LBGES and PLBA. The size of data carried by the Grid task agent that is denoted by D, all tasks have the same computation size D = 25 KB. There are 25 Grid resource agents in the system. The arrival time of each resource request is determined randomly by an exponential distribution with mean of 200 ms, but we will change the values of arrival time when testing the effect of requests frequency on response time and resource allocation efficiency. All parameters are summarized in Table 5. Initially, all nodes have a zero load. Throughout the experiment, only one Grid user agent is used for the generation of Grid resource requests. For the simulated
123
Engineering with Computers Table 5 Simulation parameters Parameter
Value
Number of nodes in a cluster
25 nodes in a cluster
Number of nodes (S)
100, 500, 1,000, 2,000
Reschedule interval (s)
600
Number of jobs
1,000
Requests intensity/frequency (Ri) ms
50, 100, 200, 400
Arrival time (ms)
200
Fig. 6 Comparison of response time
scenarios, it does not matter how many Grid user agents there are in the system. These experimental configurations are to bring up the performance of the load balancing algorithm as many as possible. 5.2 Simulation results The first experiment is to compare the performance of WLB, an LBGES algorithm with PLBA in terms of response time and resource allocation efficiency. Figure 6 shows the response times that are influenced by Grid size, available connections and bandwidth using request intensity (Ri = 200 ms). For comparing different size Grid, a lower average response time is considered to be better depicted through Fig. 6. All algorithms present the best results in the small size Grid but, when the size of the Grid is larger, PLBA is decreasing quickly. The response time using PLBA can be 50 and 44 % shorter than that using the WLB and LBGES, respectively. As shown in Fig. 7, the PLBA outperforms the conventional WLB and LBEGS for the different size grid. Secondly, we measured the resource allocation efficiency of PLBA and WLB and LBEGS using Ri = 200 ms. The WLB and LBEGS achieves to match nearly 98 % of all requests in the small size Grid scenario, with PLBA closely behind. However, as Grid size increases, the WLB and
123
LBEGS degraded performance than the PLBA. Under the large size Grid, decrease of results for WLB and LBEGS is lower than in the small size. Resource allocation efficiency using PLBA is 27 and 15 % larger than that using the WLB and LBEGS. Varying Grid size, result decreases for both methods in a similar manner. The basic experiments described above have been conducted to show that the fundamental differences between the PLBA and the WLB and LBEGS. Next, a sensitivity experiment is devised in order to change the request frequency variables. The effects of request frequency to response time and resource allocation efficiency under different size of the Grid through request intensity shown in below mentioned experiments. The following figure shows the behaviour of two allocation methods when changing the resource requests frequency from 50 to 100 and 200 to 400 ms. For the response time, a lower request frequency leads to faster access times; whereas, X-axis shows a change in request frequency. Presented through Figs. 8, 9, 10, the response time using PLBA can be 21 and 9 % shorter than that using the WLB and LBEGS, respectively, under small size Grid (S = 100) and low request frequency (Ri = 200 ms). When changing the size of Grid by S = 500 and remain request frequency as Ri = 200 ms, the response time of WLB and LBEGS is 19 and 11 % and longer than that using the PLBA. When increasing the size of Grid by S = 2000 and request frequency by Ri = 50 ms, the performance of WLB and LBEGS deteriorates quickly, the response time is 79 and 29 % longer than that using the PLBA. Under big size Grid (S = 2000) and high request frequency (Ri = 50, 100 ms), WLB and LBEGS take more time to allocate appropriate resources, the response time is 70 and 43 % longer than that using the PLBA. This effect could be caused by an increasing synchronization of the frequency request generation with the negotiations that lead to successful resource provisioning. In other words, the large number of open requests leads to longer queues for processing, whereas the queues get shorter with decreasing request frequency, the overall processing gets faster. Considering the resource allocation efficiency, from the results in Figs. 11, 12, 13, under small size Grid and low request frequency (Ri = 200 ms), the resource allocation efficiency using PLBA is nearly the same as that using the WLB and LBEGS. When increasing the size of Grid by S = 100 and remain request frequency as Ri = 200 ms, the resource allocation efficiency of PLBA as much as 23 and 12 % larger that using the WLB and LBEGS, respectively. When increasing the size of Grid by S = 500 and request frequency by Ri = 50 ms, the resource allocation efficiency of WLB and LBEGS is as much as 43 and 11 % less than that using the PLBA. Under big size Grid (S = 2000) and high request frequency (Ri = 50), the resource all
Engineering with Computers Fig. 7 Resource allocation efficiency
Fig. 8 Effects of request frequency on response time under S = 100
Fig. 9 Effects of request frequency on response time under S = 500
location efficiency of WLB and LBEGS is as much as 43 % less than that using the PLBA. Resource allocation efficiency of 93 % can be achieved for the PLBA with a
200-ms pulse even in high request density. Complete saturation of request in WLB and LBEGS can be attained in a small size Grid/high-request density scenario. Increasing
123
Engineering with Computers Fig. 10 Effects of request frequency on response time under S = 2000
Fig. 11 Effects of request frequency resources on allocation S = 100
Fig. 12 Frequency effect on resource allocation efficiency under S = 500
123
Engineering with Computers Fig. 13 Frequency effect on resource allocation efficiency under S = 2,000
Fig. 14 Communication overhead times vs. Gridlets on PEs = 200
Fig. 15 Makespan vs. Gridlets on PEs = 200
15
WLB Min-Max Max-Min LBGES PLBA
Makespan VS Gridlets on PEs= 200
Makespan
10
5
0 200
400
600
800
1000
1200
1400
1600
1800
2000
No. of Gridlets
the request frequency implies a high charge to the system and in case of 50-ms pulses reduce the resource allocation efficiency to nearly 50 % in all scenarios. In most of the test cases, the PLBA is more efficient than the WLB and LBEGS to allocate Grid resources in the test application. When Grid size is creasing, it has more merits to use the PLBA to schedule Grid resource; the PLBA has better performance than usual WLB and LBEGS.
5.3 Overall comparison of PLBA with existing algorithms In Sect. 5, an algorithm is introduced for resource selection including min–min, max–min, WLB, LBEGS, PLBA, etc. In this section, the performance of WLB, min–min, max–min, LBEGS and PLBA in a Grid will be investigated by simulation. Considering that there are ten
123
Engineering with Computers
resources that every resource has five machines, and every machine has four PEs. That is leading to 200 PEs in total. The numbers of Gridlets are kept constant at 500 and every Gridlet at the beginning time and without a deadline request. Every PE is between grade 1 and 10 and every Gridlet is between grade 1 and 10 too. For ease of study, we set the local load factor of resources to be zero; this does not affect the performance of the algorithms. These settings are convenient for us to analyse the effectiveness and the behaviour of the algorithms. The resource load threshold parameters and Gridlet parameters are listed in Table 3. The comparison of communication overhead times and makespan between above algorithms are depicted through Figs. 14 and 15. Figure 14 shows the communication overhead times of the different algorithms keeping the number of Gridlets up to 2,000. It shows that the times of WLB and min–min increase quickly by increasing Gridlets; keeping max–min is in the middle. It can be observed that LBEGS and PLBA always have the minimum of five algorithms and increase very slowly. Figure 14 illustrates that PLBA gives better performance by reducing communication overhead times the makespan of the different algorithms is included in Fig. 15 which demonstrates that the makespan of all algorithms increases with the number of Gridlets increasing; PLBA is the minimum of all algorithms that gives good performance with respect to makespan. In conclusion, PLBA has better performance in reducing communication overhead times and makespan.
6 Conclusions This paper proposed a variable threshold-based dynamic load balancing algorithm with random policy for better response time, communication overhead and resource allocation efficiency, etc., wherein resources, machines and the Grid broker participates in the load balancing operations. Dynamic threshold value has been used at each level according to the Grid size in the network. Furthermore, it is extended by dividing lightly loaded node into lightly and under-lightly loaded categories, in the context of variable threshold and standard deviation. This approach actually minimizes the searching time to find out the receiver machine in order to transfer the Gridlets. A rigorous examination of the proposed algorithm on the GridSim simulator has been carried out in order to judge the effectiveness of the algorithm. The simulation results are establishing the effectiveness of the PLBA over WLB and LBEGS. PLBA enhances resource allocation capacity and better response time and communication overhead. Future work can be extended towards adjusting the balance
123
threshold function and making it more adaptable to differing environments. Acknowledgments This work has been carried out at Distributed and Grid Computing Lab, Department of Computer Science and Engineering at Jaypee University of Engineering and Technology, Guna, M.P. The authors acknowledge the support provided. We would like to thank all anonymous reviewers for their comments and suggestions for improving the paper. We would like to thank Mr. Punit Pandey and Dr. Ashutosh Singh for helping in improving the language and expression of a preliminary version of this paper.
References 1. Rehman A, Qureshi K, Manuel P, Rashid H (2008) Resource topology aware GridSim: a step ahead. J Comput 19(2):3–22 (special issue on Grid and Cluster Computing) 2. Shah M, Qureshi K, Rasheed H (2010) Optimal job packing, a backfill scheduling optimization for a cluster of workstations. J Supercomput 54(3):381–399 (ISSN 0920-8542) 3. Qureshi K, Hussain SS (2008) A comparative study of parallelization strategies for fractal image compression on a cluster of workstations. Int J Comput Methods 5(3):463–482 4. Lin W, Shen W (2010) Tree-based task scheduling model and dynamic load-balancing algorithm for P2P computing. CIT 2903–2907 5. Rathore NK, Chana I (2013) A sender initiate based hierarchical load balancing technique for grid using variable threshold value. In: International conference IEEE-ISPC JUIT, Waknaghat, H.P., 978-1-4673-6190 6. Wang B, Shen Q (2011) ID management and allocation algorithm for P2P load balancing. Wuhan Univ J Nat Sci 16(4):293–300 7. Peng L, Xiao W (2011) A binary-tree based hierarchical load balancing algorithm in structured peer-to-peer systems. JCIT 6(4) 8. Ludwig SA, Moallem A (2011) Swarm intelligence approaches for grid load balancing. J Grid Computing 9(3):279–301 9. Yagoubi B, Slimani Y (2007) Task load balancing strategy for grid computing. J Comput Sci 3(3):186–194 (ISSN 1546-9239, Science Publications-2007) 10. Erdil D, Lewis M (2012) Dynamic Grid load sharing with adaptive dissemination protocols. J Supercomput 59(3):1139–1166 11. Long Q, Lin J, Sun Z (2011) Agent scheduling model for adaptive dynamic load balancing in agent-based distributed simulations. Tongji University, Shanghai, China simulation modelling practice and theory. Science Direct Elsevier, pp 1021–1034 12. Zheng G, Bhatele A, Meneses E, Kale LV (2011) Periodic hierarchical load balancing for large supercomputers. IJHPCA 25(4):371–385 13. Nandagopal M, Uthariaraj RV (2010) Hierarchical status information exchange scheduling and load balancing for computational grid environments. Int J Comput Sci Netw Secur 10(2):177–185 14. De Grande RE, Boukerche A (2011) Dynamic balancing of communication and computation load for HLA-based simulations on large-scale distributed systems. J Parallel Distrib Comput 71(1):40–52 15. Sharma D, Saxena AB (2011) Framework to solve load balancing problem in heterogeneous web servers. IJCSES 2(1) 16. Nandagopal M, Gokulnath K, Rhymend Uthariaraj V (2010) Sender initiated decentralized dynamic load balancing for multi cluster computational grid environment. In: Proceedings of A2CWiC ’10. Art No. 63, ISBN: 978-1-4503-0194-7
Engineering with Computers 17. Zheng G, Bhatele A, Meneses E, Kale´ LV (2011) Periodic hierarchical load balancing for large supercomputers. IJHPCA 25(4):371–385 18. Rathore N, Chana I (2011) A cognitive analysis of load balancing and job migration technique in grid. In: WICT-IEEE Conference, University of Mumbai, pp 77–82 19. El-Zoghdy SF (2012) A Hierarchical load balancing policy for grid computing environment. IJCNIS 4(5):1–12 20. Qureshi K, Rehman A (2011) Enhanced GridSim architecture with load balancing. J Supercomput 57:265–275 21. Rings T, Caryer G, Gallop J, Grabowski J, Kovacikova T, Schulz S, Stokes-Rees I (2009) Grid and cloud computing: opportunities for integration with the next generation network. J Grid Computing 7(3):375–393 22. Hao Y, Liu G, Wen N (2012) An enhanced load balancing mechanism based on deadline control on GridSim. Future Gener Comp Syst 28(4):657–665 23. Suresh P, Balasubramanie P (2013) User demand aware grid scheduling model with hierarchical load balancing. Math Prob Engg 2013, Art No. 439362. doi:10.1155/2013/439362
24. Chen H, Wang F, Helian N, Akanmu G (2013) User-priority guided min-min scheduling algorithm for load balancing in cloud computing. National Conference on Parallel Computing Technologies (PARCOMPTECH) 25. Abdi S, Mohamadi S (2010) The impact of data replication on job scheduling performance in hierarchical data grid. Int J Appl Graph Theory Wirel Ad hoc Netw Sens Netw 2(3):15–21 26. Kamarunisha M, Ranichandra S, Rajagopal TKP (2011) Recitation of load balancing algorithms in grid computing environment using policies and strategies - an approach. IJSER 2(3) 27. Balasangameshwara J, Raju N (2012) A hybrid policy for fault tolerant load balancing in Grid computing environments. J Netw Comput Appl 35:412–422 28. Mandalapu S (2005) Dynamic load balancing design and modeling in MPIAB: general architecture. PDPTA, pp 710–716
123