Load Balancing of Virtual Machine Resources in Cloud Using Genetic ...

Load Balancing of Virtual Machine Resources in Cloud Using Genetic Algorithm Chandrasekaran K.1 and Usha Divakarla2 National Institute of Technology Karnataka, Surathkal e-mail: 1 [email protected]; 2 [email protected]

Abstract. In cloud computing most of the load balancing exists in VM migration. When the entire VM resources are migrated, due to the large granularity of VM resources and the great amount of data transferred in migration and the suspension of the service, the migration cost becomes a problem. Hence, the goal of this project is to design and implement a genetic algorithm for a scheduling strategy on Virtual Machine Resources in cloud computing environment using current system state such that it achieves load balancing and hence Virtual Machine migration problem is optimized. It will compute the influence on system after deployment of Virtual Machine resources before actually deploying and then selects the best solution having least load imbalance Keywords:

cloud computing, load balancing, virtual machine resource.

1. Introduction Cloud Computing is an emerging computing technology that is rapidly consolidating itself as the next big step in the development and deployment of an increasing number of distributed applications. It is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing provides a Service Oriented Architecture (SOA) and Internet of Services (IoS) type applications, including fault tolerance, high scalability, availability, flexibility, reduced information technology overhead for the user, reduced cost of ownership, on demand services etc. Central to these issues lies the establishment of an effective load balancing algorithm. In cloud computing most of the load balancing exists in VM migration [1]. When the entire VM resources are migrated, due to the large granularity of VM resources and the great amount of data transferred in migration and the suspension of the service, the migration cost becomes a problem [2,3]. 2. Literature Survey A thorough literature survey is done of Cloud Computing [4,5], load balancing and genetic algorithm. Also the existing open-source cloud computing technologies are explored in order to study K. R. Venugopal and L. M. Patnaik (Eds.) ICCN 2013, pp. 156–168. © Elsevier Publications 2013.

Load Balancing of Virtual Machine Resources in Cloud Using Genetic Algorithm

how genetic algorithm can be implemented using those technologies. And existing load balancing algorithms are explored. Cloud computing definition [6] “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” 2.1 Load balancing Load balancing is an even division of processing work between two or more computers and/or CPUs, network links, storage devices, ultimately delivering faster service with higher efficiency. Load balancing is accomplished through software, hardware or both, and it often uses multiple servers that appear to be a single computer system (also known as computer clustering). It is the process of improving the performance of a parallel and distributed system through a redistribution of load among the processors. A distributed system provide the resource sharing as one of its major advantages, which provide the better performance and reliability than any other traditional system in the same conditions. One of the research issues in parallel and distributed systems is the development of effective techniques for distributing workload on multiple processors. The main goal is to distribute the jobs among processors to maximize throughput, maintain stability, resource utilization and should be fault tolerant in nature. Local scheduling performed by the operating system consists of the distribution of processes to the time-slices of the processor. On the other hand Global scheduling is the process of deciding where to execute a process in a multiprocessor system. Global scheduling may be carried out by a single central or master processing element, or it may be distributed among the processing elements. 2.1.1 Load balancing schemes Load balancing algorithms can be classified into static and dynamic approaches. Static load balancing algorithm Static load balancing algorithms assume that a priori information about all the characteristics of the jobs, the computing resources and the communication network are known and provided. Load balancing decisions are made deterministically or probabilistically at compile time and remain constant during runtime. The static approach is attractive because it is simple and requires minimized runtime overhead. However, it has two major disadvantages. Firstly, the workload distribution of many applications cannot be predicted before program execution. Secondly, it assumes that the computing resources and communication network are all known in advance and remain constant. Such an assumption may not apply to a distributed environment. As static approach cannot respond to the dynamic runtime environment, it may lead to load imbalance on some resources and significantly increase the job response time [7–13]. 157

Chandrasekaran K. and Usha Divakarla

Dynamic load balancing algorithms Dynamic load balancing algorithms [18–21] attempt to use the runtime state information to make more informative decision in sharing the system load. However, dynamic scheme is used a lot in modern load balancing method due to their robustness and flexibility. A list of common parameters that can be used to characterize most of dynamic load balancing algorithms are: Centralized vs. decentraliied An algorithm is centralized if the parameters necessary for making the load balancing decision are collected at, and used by, a single resource i.e. only one resource acts as the central controller and all the remaining resources act as slaves. The centralized approach is more beneficial when the communication cost is less significant e.g. in the shared-memory multi-processor environment. Its limitation is single point of failure and non-scalable. However, in decentralized approach all the resources are involved in making the load balancing decision. Decentralized algorithms are more scalable and have better fault tolerance. Cooperative vs. non-cooperative An algorithm is said to be cooperative if the distributed components that constitute the system cooperate in the decision-making process. Otherwise, it is non-cooperative. Adaptive vs. non-adaptive If the parameters of the algorithm can change when the algorithm is being run, the algorithm is said to to adaptive (to the changes in the environment in which it is running). Otherwise, it is nonadaptive. Sender-initiated vs. receiver-initiated In a source-initiated algorithm, an over-loaded node starts negotiations with the other nodes for a potential process-migration. If a negotiation is started by an under loaded node, the algorithm is said to be destination-initiated. Preemptive vs. non-preemprive If a process that has started its execution can be transferred to some other node, then the algorithm is called a preemptive algorithm. If, on the other hand, only those processes that are in the ready queue but have not yet eeceived CPU service could be considered for migration, the algorithm is called a non-preemptive algorithm. 158


2.1.2 Load balancing policies An algorithm for the load balancing problem can be broadly categorized in terms of four policies. They are: Location policy It is the policy that affects the finding of a suitable node for migration. The common technique followed here is polling, on a broadcast, random, nearest-neighbor or roster basis. Transfer policy It is that which determine whether a node is suitable for participating in a process migration. One common technique followed is the threshold policy, where a node participates in a negotiation only when its load is less than (in destination-initiated algorithm) or greater than (in sender-initiated algorithm) a threshold value. Selection policy It is the policy that deals with the selection of the process to be migrated. The common factors which must be considered are the cost of migration (communication time, memory, computational requirement of the process, etc.) and the expected gain of migration (overall speedup of the system, etc.). Information policy It is that component of the algorithm that decides what, how and when the information regarding the state of the other nodes in the system in gathered and managed. They can be grouped under demand-driven, periodic, or state-change-driven policies. 2.2 Genetic algorithm Genetic Algorithm is search and optimization technique promised on the evolutionary ideas of natural selection and genetics [14–17]. Selection Chromosomes are selected from the population to be parents to crossover. The problem is how to select these chromosomes. There are many methods how to select the best chromosomes, for example roulette wheel selection, Boltzman selection, tournament selection, rank selection, steady state selection and some others. 159


Crossover Crossover is a genetic operator that combines (mates) two chromosomes (parents) to produce a new chrompsome (offspring). The idea behind crossover is that the new chromosome may be better than both of the parents if it takes the test characteristics from each of the parents. Crossover occurs during evolution according to a user-definable crossover probability. There are many crossover operator types, for example one point, two point, multi point, arithmetic, heuristic. Mutation Mutation is a genetic operator that alters one or more gene values in a chromosome from its initial state. This can result in entirely new gene value being added to the gene pool. With these new gene values, the genetic algorithm may be able to arrive at better solution than was previously possible. Mutation is an important part of the genetic search as it helps to prevent the population from stagnating at any local optima. Mutation occurs during evolution according to a user-definable mutation probability. This probability should usually be set fairly low (0.01 is a good first choice). If it is set to high, the search will turn into a primitive random search. There are many mutation operator types, for example, flip bit, boundary, uniform, non-uniform, Gaussian. 3. Genetic Algorithm Design and Implementation The basic Genetic algorithm and its implementation is explained as below. Mathematical formulation Consider set of physical machines P = {P1 , P2 , . . . , Pn } where n is the number of nodes in the cloud and on physical machine Pi , the set of virtual machines V = {V1 , V2 , . . . , Vmi } where mi is the number of virtual machines on physical machine Pi . There will be one cloud controller and several nodes having multiple virtual machines in cloud. The load of a physical machine usually can be obtained by adding the loads the VMs running of mi on it. Therefore we can conclude the load of physical machine Pi is Pi = j =1 V j . The current virtual machine needs deploying is V. After arranging V to physical machine, the load of every physical machine will be Pi = Pi + V ; after deploying = Pi ; for others. The load on the cloud after V M V is arranged to physical machine Pi is n C= Pi /n i=1

Genetic algorithm design The detailed description of the Genetic Algorithm used is as given below. 160


Figure 1. VM scheduler.

Genome coding The classic genetic algorithm marks the chromossme structure of genes by binary codes. It is found that it is a one-to-many mapping relationship between physical machines and VMs. Therefore, it is best to select tree structure to mark the chromosome of genes or multidimensional list. Every solution is marked as one tree or multi-dimensional list as shown in Figure 1; the scheduling and managing node of the system on the first level is the root node while all of the N nodes on the second level stands for physical machines and the M nodes on the third level stand for the VMs on a certain physical machines. Multidimensional list data structure is used to implement encoding as it is the most appropriate data structure for the project. Fitness function Fitness function = f (S) =

n 100 . i=1 |C−Pi |

Lower the difference |C − Pi |, higher the value of fitness.

Selection Roulettt wheel method is applied to select chromosomes for reproduction. f i (S) i=1 f i (S)

Pi (S) = n

where, f i (S) = Fitness of solution no. i , n = Size of genome Firstly find out the fitness of the individuals in current population by fitness function, and retain the individual with the highest fitness into the child population; then compute the selection probability of the individuals according to their fitness values. Lastly, conduct selection of the individuals by rotating the wheel so that the individual with the high fitness has higher probability being selected and those with low fitness also have the chance to be chosen. 161


Crossover The idea behind crossover is that the new chromosome will be better than both of the parents if it takes the best characteristics from each of the parents. The crossover operator is as follow: Select two parental individuals S1 and S2 according to selection strategy. Combine the two parental individuals to form a new individual solution S0 which keeps the same individuals (VMs) in two parental selections and discards the different ones. For the different VMs in the two parental individuals, distribute them to the smallest-loaded nodes in the physical machine set until the distribution of all different VMs is completed since our objective is to generate best solution having good load balancing. Mutation According to the mutation probability individuals are selected for mutation. Here from parental solution any two physical machines (two dimensions) are selected and one or more virtual machines are swapped between those selected physical machines to form new solution. Genetic algorithm implementation This experiment is performed using four machines, one as Front end, i.e., cloud controller and other three machines as cluster nodes using OpenNebula software as IaaS to build cloud. All the machines have same configuration having Intel i-7 processor, 8 GB RAM, 1TB hard disk. To interact with OpenNebula cloud, there is OpenNebula cloud API (OCA) available for different languages such as Java, Python, and Ruby. They are designed as a wrapper for the XML-RPC methods, with some basic helpers. This means that we should be familiar with the XML-RPC API and the XML formats returned by the OpenNebula core. So XML-RPC API is used to implement this project since it is straight forward and easiest approach. Many methods are available to interact with cloud controller in XML-RPC API. It has to be formed with the contents of the ONE− AUTH fied which is set during OpenNebula configuration, which will be Username : Password with the default ‘core’ auto driver. These methods were used to implement the designed genetic algorithm: One.vm.allocate One.vm.action – hold One.vm.action – release One.vm.deploy As shown in Figure 2, One.vm.allocate method allocates VM and VM comes to pending state for time being and immediately we use one.vm.action(hold) method to hold VM from being allocated to node by using OpenNebula’s internal scheduling algorithm and VM will transit to hold state and finally after finding optimal solution using genetic algorithm, we use one.vm.action(release) method and one.vm.deploy method to deploy VM to node as selected by the optimal solution using genetic algorithm. 162


Figure 2. Virtual Cachine life Mycle.

Virtual machine resources VM resources are cpu, memory, storage, network bandwidth. The designed genetic algorithm was implemented considering cpu load and memory load. Each host is characterized by a d-dimensional vector called the host’s vector of capacities: H = (h1, h2, . . . , hd). Each dimension represents the host’s capacity corresponding to a different resource such as CPU utilization, memory utilization, or disk bandwidth. Similarly, each VM is represented by its vector of demands: V = (v1, v2, . . . , vd). The load of the node is calculatad as follows: Load of node = Volume =

wi ∗ vi,

i

163


where wi is the aisigned weight to that resource and wi =

i

vi / h i

vm

which is the ratio between the total demand for resource i and the capacity of the host. In simplified form, it is as given below: System load (lood of physicad machine in cloud) = l ∗ (cpu load) + m ∗ (memory load); where l, m[0, 1]l represents the cpu weightage and m represents memory weightage. Depending upon type of application, i.e., cpu bounded or memory bounded, it must be set as shown in calculation above. 4. Results and Analysis This section shows results of designed and implemented GA with respect to other algorithms and analysis of it. Results 48 VMs were designed of different cpu and memory configurations of the linux as the operating system and ran algorithm under stable and variant load conditions. For variant load condition artificial load was generated for first machine-80%, second machine-40% and third machine 10% to test all three algorithms and the results are as follows: Load of physical machines under stable load condition with cpu and memory weightage As seen in Figure 3, genetic algorithm performs better than round-robin and greedy algorithm under stable load conditions and when cpu weightage and memory weightage are equal to 0.5.

Figure 3. Load of physical machines under stable load condition with l = 0.5 and m = 0.5.

164

Load Balancing of Virtual Machine Resources in Cloud Using Genetic Algorithm Table 1. Virtual machines allocation. Physical m/c

Greedy

Round-Robin

Genetic Algorithm

1

48

16

0

2

0

16

22

3

0

16

26

Load of physical machines under variant load condition with cpu and memory weightage Table 1 shows the virtual machines allocation for variant load condition with memory and cpu weightage equal for greedy, round-robin and genetic algorithm. From above Figure 4, it is clear that GA performs best, then round-robin algorithm and then greedy algorithm for variant load condition with cpu and memory weightage equal. Load of physical machines under variant load condition with cpu weightage (l) = 0.9 and memory weightage (m) = 0.1, e.e, for cpu oriented application The Virtual Machines allocation is shown in Table 2 for greedy, round-robin and genetic algorithm under variant load condition with cpu weightage = 0.9 and memory weightage = 0.1. As shown in Figure 5, GA does better load balancing as compared to round-robin and greedy algorithm under variant load condition with cpu weightage = 0.9 and memory weightage = 0.1, i.e., for cpu-bound application.

Figure 4. Load of physical machines under variant load condition with l = 0.5 and m = 0.5. Table 2. Virtual machines allocation. Physical m/c

Greedy

Round-Robin

Genetic Algorithm

1

48

16

0

2

0

16

8

3

0

16

40

165


Figure 5. Load of physical machines under variant load condition with l = 0.9 and m = 0.1

Load of physical machines under variant load monition with cpu weightage(l) = 0.1 and memory weightage (m) = 0.9, i.e, for memory oriented application The Virtual Machines allocation is shown in Table 3 for greedy, round-robin and genetic algorithm under variant load condition with cpu weightage = 0.1 and memory weightage = 0.9. As shown in Figure 6, GA does better load balancing as compared to round-robin and greedy algorithm under variant load condition with cpu weightage = 0.1 and memory weightage = 0.9, i.e., for memory-bound application. Analysis It is seen that the designed and implemented Genetic Algorithm surpasses Greedy algorithm and Round-Rodin algorithm irrespective of any initial load condition or any kind of application. i.e., cpubound or memory-bound application. In short, GA allocates VMs such that it achieves better load balancing. It does load balancing such that VM migrations are reduced since it considers current system load into account for allocating new virtual machines in the cloud. The parameters for GA are crossover rate, mutation rate, population size. The experiment gave better results for population size of 50, crossover rate of 0.6 and mutation rate of 0.3 approximately. These three parameters are dependent on one another. If one is varied, others also need to be varied Table 3. Virtual machines allocation.

166

Physical m/c

Greedy

Round-Robin

Genetic Algorithm

1

48

16

0

2

0

16

8

3

0

16

40


Figure 6. Load of physical machines under variant load condition with l = 0.1 and m = 0.9.

in order to get optimal solutions within reasonable time. The experiment gave better result for population size in the range (15, 60). Beyond this range, it either generated sub optimal solution or took more time to generate optimal solution, which shown that depending upon the problem instance size, the population size should be within certain range to get optimal solution wherein reasonable good computing time. 5. Conclusion and Future Work The conclusion derived from the proposed method is as below. Conclusion From the study it is understood that there exist various algorithms for load balancing in distributed environment. In cloud computing environment, most of the load balancing work is done in VM migration. When VMs are migrated, there is huge data transfer that takes place which consumes unnecessary lots of bandwidth unlike process migration. Also, the service will become slow during VM migration which costs companies a lot. So, there is a need to do load balancing in cloud which reduces VM migrations. The proposed GA schedules VMs such that it achieves load balancing and there is less need of VM migrations as it allocates VMs to physical machines in smart way using fitness function. It calculates the load of the node after VM is deployed on node before actually deploying on it and finds a solution which gives the best load balancing. It is compared with Greedy and Round-Robin algorithm, which are available in Eucalyptus. Future work The proposed solution does the load balancing considering cpu and memory load. It can be further expanded to include network I/O load and storage l/O load in load calculation of node. Apart from resource monitoring and load balancing, the proposed solution can be enhanced to include VM consolidation to save power to save electricity costs and to include thermal component 167


as well since cooling costs for data centers are also huge to save electricity costs. Thus, complete VM management software can be developed to include all these requirements which are conflicting in nature with one another which can be set depending upon current requirement of the cloud provider. References [1] Clark, C., Fraser, K. and Hand, S.: Live Migration of Virtual Machines[C]. Proceedings of the 2nd Int’l Conference on Networked Systems Design and Implementation, Berkeley, CA, USA (2005). [2] Borja Sotomayor, Kate Keahey, Ian Foster and Tim Freeman: Enabling Cost-Effevtive Resource Leases with Virtual Machines. In Hot Topics sesscon in ACM/IEEE International Symposium on High Performance Distributed Computing 2007 (HPDC 2007) (2007). [3] Cherkasova, L., Gupta, D. and Vahdat, A.: When Virtual is Harder than Real: Resource Allocation Challenges in Virtual Machine Based Environments. Technical Report HPL-2007-25, February (2007). [4] David Chappel: A Short Introduction to Cloud Platforms – An Enterprise Oriented View. http://www.dpvidchappell.com/CloudPlatforms–Chappell.adf (2011). [5] Charkravati, A. K.: Cloud computing – Challenges and Oppurtunities. http://www.cdap.in/html/pdf/articles/AKCcloud.pdf (2011). [6] Mell, P. and Grance, T.: The NIST Definition of Cloud Computing, 2009. http://csrc.nist.gov/groups/SNS/ulocd-computing/cloud-def-15.doc (2011). [7] Albert Y. Zomaya and Yee-Hwei: The, Observations on Using Genetic Algorithms for Dynamic LoadBalancing. IEEE Transactilns on Parallel and Distributed Systems, 12(9), September (2001). [8] Sandeep Tayal: Tasks Scheduling Optrmization for the Cloud Computing Systems. Internationnl Journal of Advanced Engineering Sciences and Technologies, 5(2), 111–115, February (2011). [9] Martin Randles, David Lamb and Taleb-Bendiab, A.: A Comparative Study into Distributed Load Balancing Algorithms for Cloud Computing. In 24th International Conference on Advanced Information Networking and Applications Workshops (2010). [10] Rich Lee and Bingchiang Jeng: Load-Balancing Tactics in Cloud, International Conference on CyberEnabled Distributed Computing and Knowledge Discovery (2011). [11] Iman Barazandeh and Seyed Mortazavi: Two Hierarchical Dynamic Load Balancing Algorithms in Distributed Systems. Second International Conference on Computer and Electrical Engineering (2009). [12] Milan Soklic: Simulation of Load Balancing Algorithms: A Comparative Study. ACM-SIGCSE Bulletin, December (2002). [13] Martin Randles, David Lamb and Taleb Bendiab, A.: A Comparative Study into Distributed Load Balancing Algorithms for Cloud Computing (2010). [14] Marek Obitko: Genetic Algorithm Tutorials, 1998. http://www.obitko.com/tutorials/genetic-algorithms/index.php (2011). [15] Matthew Wall: Introdution to Genetic Algorithms. http://lancet.mit.edu/ mbwall/presentations/IntroToGAs/P001.html (2011). [16] John H. Holland: Genetic Algorithms. http://www2.econ.iastate.edu/tesfatsi/holland.gaintro.htm. (2011). [17] Goldberg, E.: The Existential Pleasures of Genetic Algorithms. In Genetic Algorithms in Engineering and Computer Science, Winter G ed. New York, Wiley, 23–31 (1995). [18] Nakrani, S., Tovey, C., Nakrani, S. and Tovey, C.: On Honey Bees and Dynamic Server Allocation in Internet Hosting Centers. Adaptive Behavior, 12, 223–240 (2004). [19] Rahmeh, O. A., Johnson, P. and Taleb-bendiab, A.: A Dynamic Biased Random Sampling Scheme for Scalable and Reliable Grid Networks (2008). [20] Tateson, R., Halloy, J., Shackleton, M. and Deneubourg, J. L.: Aggregation Dynamics in Overlay Networks and their Implications for Self-Organized Distributed Applications. Comput. J. 52, 397–412, July (2009). [21] Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J. R. and Greenberg, A.: Join-idle-queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services, Perform. Eval., 68, 1056–1071, November (2011).

168

Load Balancing of Virtual Machine Resources in Cloud Using Genetic ...

Load Balancing of Virtual Machine Resources in Cloud Using Genetic ...

Suggest Documents

Virtual machine load balancing

Virtual Machine Migration Implementation in Load Balancing for Cloud ...

dynamic virtual machine load balancing in cloud network - IRAJ

Load Balancing in Xen Virtual Machine Monitor

A Novel Approach of Load Balancing in Cloud Computing using ...

A New Load Balancing Technique for Virtual Machine ... - CiteSeerX

Load Balancing in Grid Environment using Machine ... - CiteSeerX

Load Balancing in Cloud Computing using Stochastic ...

Load Balancing in Cloud Computing using Stochastic Hill ... - Core

Load Balancing in Cloud Computing Environment Using Improved ...

Availability and Load Balancing in Cloud Computing

LOAD BALANCING IN CLOUD COMPUTING SYSTEMS ... - ethesis

Load Balancing Technique in Cloud Computing : A

A Genetic Algorithm (GA) based Load Balancing Strategy for Cloud ...

load balancing of virtual machines using service broker algorithm

Analysis of Public Cloud Load Balancing using Partitioning Method ...

Analysis of Public Cloud Load Balancing using Partitioning Method ...

Load Balancing Techniques of Cloud Computing

An Efficient Load Balancing Algorithm for Cloud Computing Using ...

Load Balancing on Cloud Using Professional Service ...

Fog Load Balancing for Massive Machine Type

Fog Load Balancing for Massive Machine Type

Load Balancing with Tree Parity Machine - SERSC

Observations on using genetic algorithms for dynamic load-balancing