Virtual Machine Allocation in Cloud Computing ... - Semantic Scholar

16 downloads 0 Views 2MB Size Report
vices, computational facilities, or data storage to an off-site, ... bandwidth and storage, which are potentially ... is known as Hypervisor (Nurmi et al., 2009;.
International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013 47

Virtual Machine Allocation in Cloud Computing Environment Absalom E. Ezugwu, Department of Computer Science, Faculty of Science, Federal University Lafia, Lafia, Nasarawa State, Nigeria Seyed M. Buhari, Department of Computer Science, Faculty of Science, Universiti Brunei Darussalam, Gadong, Brunei Sahalu B. Junaidu, Department of Mathematics, Faculty of Science, Ahmadu Bello University, Zaria, Kaduna State, Nigeria

ABSTRACT Virtual machine allocation problem is one of the challenges in cloud computing environments, especially for the private cloud design. In this environment, each virtual machine is mapped unto the physical host in accordance with the available resource on the host machine. Specifically, quantifying the performance of scheduling and allocation policy on a Cloud infrastructure for different application and service models under varying performance metrics and system requirement is an extremely challenging and difficult problem to resolve. In this paper, the authors present a Virtual Computing Laboratory framework model using the concept of private cloud by extending the open source IaaS solution Eucalyptus. A rule based mapping algorithm for Virtual Machines (VMs) which is formulated based on the principles of set theoretic is also presented. The algorithmic design is projected towards being able to automatically adapt the mapping between VMs and physical hosts’ resources. The paper, similarly presents a theoretical study and derivations of some performance evaluation metrics for the chosen mapping policies, these includes determining the context switching, waiting time, turnaround time, and response time for the proposed mapping algorithm. Keywords:

Manger-Worker Process, Private Cloud, Rule-Based Mapping, Virtual Machine (VM) Allocation, Virtualization

1. INTRODUCTION Cloud computing presents to an end-cloud-user the modalities to outsource on-site available services, computational facilities, or data storage to an off-site, location-transparent centralized facility or “Cloud” (Ioannis & Karatza, 2010). A “Cloud” implies a set of machines and web

services that implement cloud computing. These machines ideally comprise of a pool of distributed physical compute resources that include the following: processors, memory, network bandwidth and storage, which are potentially distributed physically across network of servers that cut across geographical boundaries. Resources associated with cloud computing

DOI: 10.4018/ijcac.2013040105 Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

48 International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013

are often organized into a dynamically logical entity that are outsourced and leased out on demand. One of the major characteristics of cloud computing is elasticity, which means that cloud resources can grow or shrink in real-time (Sarathy et al., 2010). This transformation in cloud computing is made possible today by the concept of virtualization technology. Over the past few years, the idea of virtualization technology has become a more common phrase among IT professionals. The main concept behind this technology is to enable the abstraction or decoupling of application payload from the underlying distributed physical host resource (Buyyaa et al., 2009; Popek & Goldberg, 1974). This simply means that the physical resources can be presented in the form of either logical or virtual resources depending on individual choices. Furthermore, some of the advantages of implementing virtualization technology are to assist cloud resource providers to reduce costs through improved machine utilization, reduced administration time and infrastructure costs. By introducing a suitable management mechanism on top of this virtualization functionality (as we have proposed in this paper), the provisioning of the logical resources could be made dynamic that is, the logical resource could be made either bigger or smaller in accordance with cloud user demand (elastic property of the cloud). To enable a truly cloud computing system, each computing resource element should be capable of being dynamically provisioned and managed in real-time based on the concept of dynamic provisioning as it applies to cloud computing. This abstraction actually forms the bases of the proposed conceptual framework as presented in Section 3. To implement the concept of virtualization cloud developers often adopted and make use of the concept of an open source software framework for cloud computing that implements what is commonly referred to as Infrastructure as a Service (IaaS). This software framework is known as Hypervisor (Nurmi et al., 2009; Chisnall, 2009). A hypervisor, also called virtual machine manager (VMM), is one of many

hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host machine. However, there are different infrastructures available for implementing virtualization for which we have different virtual infrastructure management software for that (Sotomayor et al., 2009). This paper proposes a novel simulation framework for the cloud developers. There are two high level components in the proposed architecture which are the Manager process and Worker process. The Manager is assigned the role of cluster controller while the worker is assigned the role of node controller in the new system. The focus of the work however, is to develop a framework model that will allow dynamically the mapping of VMs onto physical hosts depending upon the resource requirements of the VMs and their availability on the physical hosts. The cloud resources to be considered include machines, network, storage, operating systems, application development environments, and application programs. The remainder of the paper is organized as follows. In Section 2, a survey of related work on cloud and eucalyptus cloud environment is presented. The proposed system architecture and model is described in Section 3, and in Section 4, an in-depth description of VM allocation and rule-based mapping algorithm based on set theoretic concept is presented. Theoretical derivation of performance evaluation metrics for the proposed system is derived in Section 5. Finally, Section 6 offers concluding remarks on the work.

2. RELATED WORK 2.1 Cloud Platform Environment Cloud computing as defined in Buyya et al. (2008) is a type of parallel and distributed computing system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013 49

consumers. Currently there are very good cloud environments which provide effective, efficient and reliable services to the cloud users, among which include Amazon1, Google App Engine2, Apple MobileMe3, and Microsoft clouds (Amazon, 2009; Chu et al., 2007).

2.2 Eucalyptus Cloud Environment Eucalyptus is a java-based cloud management tools which consists of five high-level components. These components are the cloud controller, cluster controller, node controller, storage controller and the walrus. Each high-level system component has its own Web interface and is implemented as a standalone Web service (Nurmi et al., 2009; Yoshihisa and Garth, 2010). Cloud controller is responsible for exposing and managing the underlying virtualized resources (machines (servers), network, and storage) via user-facing APIs. Currently, this system component exports a well-defined industry standard API (Amazon EC2) and via a Web-based user interface (AEC, 2011). The storage controller provides block-level network storage that can be dynamically attached by VMs4. The current implementation of the storage controller supports the Amazon Elastic Block Storage (EBS) semantics (Kleineweber et al., 2011). Walrus is a storage service that supports third party interfaces, providing a mechanism for storing and accessing VM images and user data. The

cluster controller gathers the required information and schedules VM execution on specific node controllers, as well as manages virtual instance network that run inside the cloud environment. The node controller manages the execution, inspection, and termination of VM instances on the particular host where it runs. Node controller is responsible for the VM instance. When user places requests for a VM image through the cloud controller, the node controller would start and run that particular VM image and make available an instance of such VM in the network which is then accessed by the cloud end user. Figure 1 depicts the cloud model of Eucalyptus. The cloud controller is the entry point into the cloud computing platform environment for users and administrators. It queries node managers for information about resources, makes high level scheduling decisions, and implements them by making requests to cluster controller. Then cluster controller makes queries by using node controllers to implement the cloud controller requests. The users interact with the cloud controller via the user interface by means of access protocol known as the Simple Object Access Protocol (SOAP) or Representational State Transfer (REST) messages. Cloud controller moves the user requests to the cluster controller. A cloud controller can have multiple cluster controllers and one particular cluster controller can have multiple node controllers

Figure 1. Eucalyptus architecture

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

50 International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013

at the same time. In Upatissa and Atukorale (2012) an abstract architecture of a modified private cloud model is presented with focus on enhancing the effectiveness of managing virtual machine images in eucalyptus based cloud environment. In the context of Cloud computing and as assumed in this paper, any hardware or software entity such as high-performance systems, storage systems, laboratory device servers or applications that are shared between users of a Cloud is called a resource. However, for the rest of this paper and unless otherwise stated, the term resource means hardware such as computational nodes in the network or storage systems. Resources are also laboratory devices or laboratory device servers and hence, these terms are used interchangeably. The networkenabled capabilities of the resources that can be invoked by users, applications or other resources are called services.

al. (2011) for the proposed virtual computing laboratory. Similarly, we intend to also extend the eucalyptus private cloud architecture similar to the work discussed in Upatissa and Atukorale (2012), Nurmi et al. (2009) by introducing two high level management components (i.e. the manager process and worker processes). In the proposed model, one virtual machine called the manager is responsible for keeping track of assigned and unassigned query information about resource. The worker queries and controls the system software on its node in response to queries and control requests from its manager. The advantage associated with allocating a single task at a time to each worker is that it balances workload. Keeping workload balanced is essential for high efficiency, and we choose first the manager/worker paradigm as the basis for our cloud design. Detail of this paradigm is further explained below in section I and II.

3. SYSTEM ARCHITECTURE: VIRTUAL COMPUTING LABORATORY MODEL

The manager process is the gateway into the cloud management platform. Its function is to query any machine that has network connectivity to both the nodes running worker processes and to the machine running the cloud controller for information about resources. Subsequently, the manager process is assigned the task of i) making scheduling decisions (such as scheduling of incoming instances to run on specific nodes) ii) controlling the instances of virtual network overlay, and iii) gathering information about a set of nodes from the worker process. Many of the manager’s request operations to the worker processes take the following format: describeInstances, describeResources, and etcetera. When the manager process receives a set of operation instances to perform, it would first request for information regarding available and suitable resource through it describeResources function from the worker processes. The worker process then searches for this information comprising of lists of resource characteristics (processor, memory, storage and bandwidth) and sends report back to the manager process. With this information the manager process computes

As earlier mentioned that virtualization is the key technology underlying most private cloud implementations and that it enables multiple virtual machines to run on a single physical node. The private cloud in essence is actually more than just virtualization. It is a programmatic interface, driven by an API and managed by a cloud controller that enables automated provisioning of virtual machines, networking resources, storage, and other infrastructure services. Our main concern in this article, is to find an alternative but a suitable means in which the provisioning of virtual machine together with their mapping unto physical host can be managed and made even more effective and efficient in a typical private cloud computing environment. To support the run-time allocation of virtual machines to physical hosts, we will construct a manager/worker-style parallel paradigm akin to the work presented in Malgaonkar et

3.1 Manager Process

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013 51

the number of simultaneous instances of the specific type that can be executed on the lists of available nodes and sends this value to the cloud controller for delegation and allocation to the various booked nodes. This step also applies to the rest query functions.

3.2. Worker Process The worker process manages all information regarding VM instance per host respectively. A Worker process executes on every node that is designated for hosting VM instances. A Worker queries and controls the system software on its node in response to queries and control requests from the manager process. The worker processes execute queries from the manager process such that discoveries of the physical nodes resource profiles are acquired. These profiles information entail the number of processors, the size of memory, the available disk space, and as well as to learn about the state of VM instances on the node. The information thus collected is propagated back to the Manager for further processing and delegation to the cloud controller (see Figure 2). The cluster or resource pool in addition also consist of some local storage which can be either true local storage that is physically attached to the node, or that is accessed via a shared pool of storage over storage local area network fibre channel or similar mechanisms.

The architecture shown in Figure 3 can be expanded to include multiple clusters comprising of managers and workers that add both capacity to the solution as well as redundancy that can be used to increase the overall availability of the infrastructure.

4. PRELIMINARIES: SET THEORY AND MODELLING OF VM MAPPING The proposed model considered in this paper is based on a formal model described in (Kleineweber et al., 2011) which is a discussion on rule based techniques for the mapping of virtual machines and virtual network links. However, we chose to concentrate on the aspect of virtual machines mapping that associates resource requirement of the virtual machines and their availability on the physical hosts. Before venturing into modelling of VM-mapping, we may recall the following preliminary ideas on set theory. Definition 1: A set is a collection of distinct elements or objects of some kind with a common property that, given an object and a set, it is possible to decide if the object belongs to the set.

Figure 2. In a manager/worker-style, a manager process send the elements of a set is often denoted by lower case letter (e.g. a, b, c, . . .)

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

52 International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013

Figure 3. The proposed private cloud architecture

1. A set can be described by A={a,b,c,d}, the elements of a set is often denoted by lower case letter (e.g. a,b,c, …). 2. x ∈ A, if an element x beleongs to a set A. ( x ∉ A if an element x does not belong to a set A ). Similarly, x ∈ X or y ∈ Y , means ‘x belongs to set X ’ or ‘y belongs to set Y .' Example 1: Let X = {x 1, x 2 , x 3 } = {memory,OS,ram} , where x 1 = memory, respectively etc.,. Then ‘ram’∈ X and ‘Book’ ∉ X. Example 2: Let X = {x 1,..., x 10 } . Then x 2 ∈ X and x 100 ∉ X . The notions of the union of two sets and the intersection of two sets are well-known and therefore not defined here.

Definition 2: (Function). A function from a set X to a set Y is a rule, which assigns to an element x ∈ X a unique element y ∈ Y , denoted by f : X → Y , or y = f (x ) o r w h e r e x ∈ X a n d y = f (x ) ∈ Y We develop a model of virtual machine mapping by defining the set of virtual machines as V = {v 0 , v1 ..., vm } , also m denote the number of virtual machines to be mapped to the real host. Similarly, we represent the set of real host as R = {r0 , r1,..., rn } , , where R represents the computational nodes provided at the data centre, n is the number of physical hosts available. The subscripts indicate an instance of either a family of virtual machine or physical host. The use of this expression is further justified based on the definition of Cartesian product equation given below:

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013 53

Definition 3: Cartesian product: An n-tuple (a1,..., an ) can be defined by explicitly listing its elements a1,..., an . The generalised Cartesian product of n sets A1,..., An is then defined as the set of all n-tuples:

∏i=1 Ai = {(a1,...,an ) A1 ∈a1,...,an ∈ An } n

(1)

Consider a virtual machine X, for n-ary

 X , there is a family of n projection {π ,..., π } such that for all (x ,..., x ) ∈ ∏ x and for all k ∈ {1,..., n } i

i =1

n

1

n

n

i =1

:= vi {p ', m ', n ', s '}. Define a function which maps a physical host to any virtual machine using round robin technique based on the number of available resources on the hosts as follow: f : V → R ∪ {e},

(2)

where e is a set of VMs without corresponding attributes and

N

Cartesian product

1

machine vi . Therefore, vi {attributes}

 r if r is the attribute / resource name     for virtual machine v f (v ) =    e if v does not have a corresponding       attribute / resource name

(3)

i

then the equation πk ((x 1,..., x n )) = x k holds. Each of the physical host has some particular attribute or profile attached to them. These profiles are actually very significant, since resources are often requested along with some requirements criteria as the case may be in a distributed resource environment. A host Machine yi , can have the following profiles; processors, memory, network bandwidth, and storage. Associate to vi the attributes, that is, vi {attribute} = vi {processors, memory, network bandwidth, and storage} Therefore, attributes may be represented as sets of component values of the physical host ri , we may denote the elements of this set as: Set: p = processors, m = memory, n = network bandwidth, s = storage available on the host machine ri , Therefore, ri {attributes} := ri {p, m, n, s} By the same rule, we may apply this notion to the set of virtual machines. Let p ' = processors, m ' = memory, and n ' = network bandwidth, s ' = storage required by the virtual

For v ∈ V and r ∈ R. Definition 4: We say that a virtual machine V is said to be compatible with a physical host R if there exist a mapping f : V × R → {0, 1} such that for some Vk ⊂ V there exists at least an element r ∈ R with f ((Vk , r )) = 1, where Vk contains some elements vi = 1, 2,..., n of V . Otherwise V is said to be incompatible if f ((Vk , r )) = 0. This means that, r ∈ R such that (Vk , r ) holds. If we denote an instance of a virtual machine v1 with resource requirement configuration index1 by v1.1, and physical host r1 with available resource configuration index1 . Then we can have a compatibility mapping of virtual machine unto physical host as shown in Figure 4: In this caseV1 = {v1.1 } where i=1, 2, 3…, n

4.1. Setting Initial Conditions It is easy to observe, that at time t = 0 there exist no initial mapping of the VMs, although, V = {v1, v2 ,..., vn } exist, and therefore the

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

54 International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013

Figure 4. Compatibility of VMs to physical host mapping functions

function f = 0 or f ((Vk , r )) → Z = 0, where Z is the set of already mapped VMs The following conditions hold for the proposed VMs mapping to physical host. The function f : V → R will map a virtual machine vi to a physical host ri only if, f ((Vk , r )) = 1, for some Vk ⊂ V and r ∈ R (4)

pi ≥ ∑ i =1 pi′, ∀ vi ∈ V

(5)

mi ≥ ∑ i =1 mi′, ∀ vi ∈ V

(6)

ni ≥ ∑ i =1 ni′, ∀ vi ∈ V

(7)

si ≥ ∑ i =1 si′, ∀ vi ∈ V

(8)

n

n

n

n

These equations enforces that necessary resources (processor, memory, nodes, and storage) are available on physical host on which the guest VMs are to be assigned to. The amount of resources required by all the guests VMs mapped to a host does not exceed the number of resources on a host. In this paper, we assume that the communication between the guest virtual machines running on the same host and the host machine itself is contention-free in terms of resource al-

location i.e. each guest VM runs in a specified and designated address space. Similarly each VM is mapped exactly once onto physical host machine. The pseudo-code for this purpose is presented in Algorithm Listing 1. •

Round-Robin: A round-robin algorithm distributes the load equally to each server, regardless of the current number of connections or the response time (Mohanty et al., 2011). Round-robin is suitable when the servers in the cluster have equal processing capabilities; otherwise, some servers may receive more requests than they can process while others are using only part of their resources. In the above algorithm a dynamic time quantum concept is suggested and used so as to improve the average waiting time, average turnaround time and to decrease the number of context switches. An abstracted view of round-robin algorithm is provided by the following algorithm Listing 2.

5. PERFORMANCE EVALUATION As previously discussed, the sharing of cloud resources is based on demand and the dynamic utilization of these resources comes under different conditions. Therefore, this section presents some empirical studies of the key evaluation

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013 55

Algorithm 1. Mapping of guest VMs to physical host­

metrics for the cloud models presented in Section 3. However, we anticipated to use these parameters to project the efficiency of the proposed model. Therefore, this study evaluates the proposed model from the performance efficiency view, and not from cost perspective. If assumed that cloud is affordable, then performance of such system should be an issue to reconcile with. The performance metrics summarized in Table 1, have been used to evaluate the system performance. Considering the fact that the proposed system deploys the round robin scheduling algorithm to map virtual machines to physical host, therefore, the major concentration for the performance metrics is on determining the Context switching, Waiting time, Turnaround time, and Response time.

The waiting time wt of a mapping task ti refers to the time lapses between the dispatching of VM’s and before their mapping scheduling with the physical host begins. Simply put, it can also be referred to as the amount of time a VM’s to be mapped has been waiting in the ready queue. The Average waiting time (AVGWT) of mapping task ti is defined as follows: AVGWT =



n i =1

wt[ti ]

n



(9)

However, the mapping of different virtual machine by the manager onto the physical host depends on the resource requirement of the virtual machines and their availability on the host system. We include these two additional

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

56 International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013

Algorithm 2. Round Robin­

factors by assigning a weight and delay factor to the system. The number of times it takes for each mapping tasks to exploit the allotted time quantum is weighted to the number of resource requirement rr [ti ] and availability factor af [ti ]. Therefore the average weighted waiting time (AVGWWT) is defined as follows:

∑ AVGWWT =

n i =1

(rr [ti ] + af [ti ]).wt[ti ]

rr [ti ] + af [ti ] (10)



n i =1

Turnaround time of a mapping task ti is referred to as the time difference between the arrival of a mapping request and the successful completion of the mapping task. The average turnaround time (AVGTT) and the average weighted turnaround time are given as: AVGTT =



AVGWTT =

n i =1

tt[ti ]

n



n i =1



(11)

(rr [ti ] + af [ti ]).tt[ti ]



n i =1

rr [ti ] + af [ti ]



(12)

Response time of a mapping task ti is the time frame from when a request is submitted until the time when the virtual machine is mapped to the physical host and first response

is produced. The average response time (AVGRT) and average weighted response time (AVGWRT) are defined accordingly as follows: AVGRT =



AVGWRT =

n i =1

rt[ti ]

n



n i =1



(13)

(rr [ti ] + af [ti ]).rt[ti ]



n i =1

rr [ti ] + af [ti ]



(14)

The performance of the system depends on the length of a time quantum assigned. The idea of choosing a short quantum is considered a good choice because it would allow many mapping processes (as in our case the mapping of virtual machines to physical hosts) to circulate through the waiting queue quickly, thereby allowing each process a brief chance to run. In this way, highly interactive tasks that usually do not use up their quantum will not have to wait for so long before they get processed again by the system. The advantage of this approach is the improvement on the interactive performance of the entire system. However, selecting a short quantum is bad in someway because the system must perform a context switch whenever a process gets pre-empted. This is considered somewhat as an overhead: since each time the system does other than executing submitted request to performing mapping tasks is seen essentially as an overhead. A short quantum

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Cloud Applications and Computing, 3(2), 47-60, April-June 2013 57

Table 1. Description of symbols used in the Round Robin performance metrics

implies many such context switches per unit time, which takes the system away from performing useful work. The overall performance efficiency PE of the system is determined therefore based on the value of the assigned time quantum Q, the useful time T taken to execute a mapping task ti before the occurrence of a context switch, the context switch time Tcs required by task ti and the total time taken to round up the whole execution tasks (i.e. T + Tcs). This is computed as follows:   T if Q = ∞ T + Tcs  T if Q > T PE =   T + Tcs  if Tcs