Virtual Machine Allocation in Cloud Computing for Minimizing Total ...

11 downloads 10811 Views 260KB Size Report
Virtual Machine Allocation in Cloud Computing for. Minimizing Total Execution Time on Each Machine. Quyet Thang NGUYEN, Nguyen QUANG-HUNG, Nguyen ...
Virtual Machine Allocation in Cloud Computing for Minimizing Total Execution Time on Each Machine Quyet Thang NGUYEN, Nguyen QUANG-HUNG, Nguyen HUYNH TUONG, Van Hoai TRAN, Nam THOAI Faculty of Computer Science & Engineering, Ho Chi Minh city University of Technology, Vietnam 268 Ly Thuong Kiet, Ho Chi Minh, Vietnam Email: {nqthang;hungnq2;htnguyen;hoai;nam}@cse.hcmut.edu.vn Abstract—This paper considers a virtual machine allocation problem. Each physical machine in cloud has a lot of virtual machines. Each job needs to use a number of virtual machines during a given and fixed period. The objective aims to minimize the cost induced by total execution time on each physical machine. This allocation problem is proved to be N P -hard. Additionally, three mixed integer linear mathematical models are constructed to represent and solve the problem. The performance comparison of the three proposed models is analyzed through some empirical results. Keywords: virtual machine allocation; cloud computing; MILP.

I. I NTRODUCTION Cloud computing is driven by economies of scale. A cloud system uses virtualization technology to provide cloud resources (e.g. CPU, memory) to users in form of virtual machines. Virtual machine, which is a sandbox for user application, fits well in the education environment to provide computational resources for the needs in teaching and research. According to resource owner’s view, they want to reduce operation costs which induces electronic bill of large-scale data center system. Virtual Machine (VM) allocation problem in virtualized data centers is a challenging topic. The VM allocation problem can be seen in static and dynamic mapping. A VM allocation aims to map each VM to physical machines (PMs) to optimize a given objective function. The objective function can be maximizing performance, minimizing power/energy consumption, or maximize provider’s profit, etc. Static VM allocation can be seen as a d-dimensional Vector Bin Packing problem (V BPd ) [5], [6] in which, VMs are items and physical machines are bins. The V BPd (d ≥ 1) bin packing problem is NPHard. A dynamic VM allocation differs from static VM allocation is that in dynamic VM allocation, a system event (e.g. low CPU utilization, system temperature, hardware/software failure, etc.) can trigger a re-mapping of the set of VMs and the set of PMs. Virtualization software (e.g. XEN, KVM, VMWare server ESXi) currently supports to execute more than one operating systems (OS) on the same physical machine, i.e. some VMs can share same physical hardware. In this study, we consider VM allocation problem in a cloud composed by m physical machines. Each of these machines can supply a number of operations demanded at any instant. These operations could be executed independently and in parallel though virtual machine generation mechanism. Depending to physical material, each machine has an upper-bound on the maximum number of virtual machines allocated at the same time. Each user/task in system could demands a service which needs a number of virtual machines in a particular period. The objective aims to find a feasible assignment satisfying the demand of all users in order to reduce the total energy consumption of the whole system. With assumption that power consumption on a physical machine does not increase so much

978-1-4673-2088-7/13/$31.00 ©2013 IEEE

when cloud system needs to use more virtual machines, the objective could be transformed as minimizing the total execution time on each physical machine. Our main contribution is not only about determining some polynomial cases and N P -Hard proof but also about proposing some Mixed Integer Linear Programming (MILP) model. Some experimental results show the performance of these proposed models on a well-known free solver COIN-OR CBC. The paper is organized as follows. In Section II, some related works are discussed. Section III, some notation used in this paper is introduced and the considered problem is addressed in detail. NP -completeness of problem will be presented in section IV. Then, several linear mathematical models describing the considered problem are proposed in Section V. Section VI presents the computational experiments and analysis. Section VII concludes this study and future works are discussed.

II. R ELATED WORK A. Background Virtual machine is a software entity, which is executed and managed by a Virtual Machine Monitor (VMM) or hypervisor such as XEN, KVM, VMware ESXi server, etc. There are three generic problems of multi-dimensional packing multiprocessor scheduling, bin packing, and the knapsack problem [5]. In [5], they considered on vector scheduling problem or namely vector bin packing problem. The vector scheduling problem is how to schedule n dimensional tasks on m parallel machines, so as to minimize the maximum load over dimensions of these all parallel machines. In [6] they studied First Fit and Best Fit Decrease (FFD and BFD) heuristics of d-dimensional vector bin packing problem and apply these heuristics to VM allocation problem.

B. Virtual machine allocation Sotomayor et al. [7] proposed lease and virtual machine to provide computational resource for short-term resource needs in teaching and researching. They presented First-Come-First-Serve (FCFS) and backfilling scheduling algorithms for schedule user leases, and using a greedy algorithm to map all identical VMs, which belongs to same user lease, onto same physical machine. Disadvantage of the greedy VM allocation algorithm is that two different leases could not be mapped to same physical machines. In [9], the authors proposed two power-aware VM allocation algorithms that represent some combinations of First Fit Decrease (FFD) and shortest duration time heuristics. Although the VM allocation algorithms can reduce total energy consumption for computing physical machines, these VM allocation algorithms in [9], however, do not lead to optimal solution. Mathematical programming approach has been applied in traditional scheduling problems on non-virtualized systems (e.g. high performance computing cluster) for many years in order to find an optimal schedule for performance [11] [10], or minimize energy consumption of heterogeneous computer clusters [12]. These works did not use virtualization and were not suitable for virtual machine

241

scheduling. Recently, there are some interesting works using mathematical programming for scheduling problems on virtualized systems (we focused on virtualized datacenters) such as [4]. Speitkamp and Bichler [4] proposed mathematical programming model for server consolidation, in which each service was implemented in a VM, to reduce number of used physical machine. Some Integer Liner Programming (ILP) models were proposed for static server allocation problem (SSAP) and extended version of the SSAP. They also claimed that the SSAP is strongly NP-Hard. There were about 600 virtual machines tested on their ILP models. For solving these ILP models, they also used both first-fit and best-fit heuristics on the allocation problem. However, the ILP models did not address arrival time of user requests, or advanced reservation lease. The VM allocation, which is mapping of set of various VMs onto set of heterogeneous physical machines with objective to minimizing total energy consumption of physical machines in a single virtualized datacenter, was studied [3] [1]. Beloglazov et al. [1] considered a dynamic VM allocation (with migration), where each VM was concerned on two dimensions: CPU usage and power consumption (Watt). Beloglazov et al. claimed that the VM allocation problem is NP-Hard, and they proposed Modified Best Fit Decreasing (MBFD) algorithm that maps each VM to heterogeneous physical machines (these machines are not equals in power consumption) such that to minimize increasing power consumption on each placement of VM and some other algorithms on migration of VMs. The MBFD is, however, a best-fit heuristic. Therefore, the mapping of the MBFD is not optimal. ´I. Goiri et al. [3] considered an energy-aware scheduling problem with similar VM allocation. In [3], they calculated score of each assignment of each VM onto a physical machine concerns hardware and software requirements, power consumption, system temperature; their scheduling algorithm was called scorebased scheduler. The score-based algorithm is of hill-climbing search on (N +1)xM matrix, each cell is a score on each assignment of each VM to a physical machine with time complexity of O(kxN xM ) (k is number of iterations). The score-based algorithm can be optimal cost-based VM allocation. Remark that VM allocation in [3] and in [1] did not consider on starting time of requests. This is an essential difference with our study - VM allocation to jobs with starting constraint for minimizing total execution time on each machine.

B.

IV. C OMPLEXITY A. Some special cases Proposition 1: If all jobs are executed on different (disjoint) time windows, the decision problem could be answered in polynomial time. Proof: Feasible decision for each job could be determined directly and independently. Hence, this special case can be solved in polynomial time. Proposition 2: If nw = 1, Ri = R (∀i = 1, . . . , n) , the decision problem could be answered in polynomial time. Proof: Since the requirement is identical for all jobs, we could assign the job (in numerical order) to execute on the physical machines which correspond to the order of non-increasing of vj : i.e. on each physical machine, unused virtual machines will be assigned to the jobs until the virtual machines are assigned completely, and then we consider the next physical machine). This algorithm has complexity of O(n). Proposition 3: If si = s (∀i = 1, . . . , n) and vj = v (∀j = 1, . . . , m), the decision problem could be answered in polynomial time. Proof: It follows the previous proposition (Proposition 2). In the following, the NP-completeness of the considered scheduling problem is proved due to the special case where vj = v (∀j = 1, . . . , m).

B. III. P ROBLEM STATEMENT We first define terms that will be used in this paper.

A.

Terminology, notation

Notation used in this paper is given below. • n: number of jobs (J1 , J2 , . . ., Jn ) • si : starting time of job Ji • pi : processing time of job Ji • Ri : number of virtual machines needed for job Ji • (t): set of jobs executed at time t ((t) = {Ji , si ≤ t < si + pi }) • T : the last execution time (T = maxi∈[1,n] (si + pi )) • m: number of physical machines (M1 , M2 , . . ., Mm ). • vj : maximum number of virtual machines allocated from physical machine Mj In this paper, we deal with VM allocation problem of m physical machines. Physical machine Mj can allocate at most vj virtual machines at any time. Each user/task Ji in system could demand a service which needs Ri virtual machines executed in period [si , si + pi ). The objective function is about minimizing the total execution time on each physical machine (if there exists a set of feasible assignments satisfying the requirement of all users). In order to determine status of N P -hardness of this optimization problem, we first determine corresponding decision version as follows.

Decision problem

Decision problem VIRM ACALLOC is described as follow. Data input: Given n jobs J1 , . . . , Jn have to be scheduled without preemption on m parallel machines (m ≥ 1). Each job is described by a starting time si , processing time pi , a number of virtual machines needed for executing Ri . All machines are available at time zero, and each one can support a number of virtual machines vj . Question: Does exist a resource assignment such that the total execution time on each machine used is not greater than a given value y ? This decision problem will be proved to be N P -complete in the next section.

NP-completeness

Theorem 1: Problem VIRM ACALLOC is N P -complete. Proof: We prove that PART IT ION ∝ VIRM ACALLOC, i.e. decision problem VIRM ACALLOC belongs to N P -complete class by a reduction to PART IT ION which is known to be is “N P -complete” [2]. . Recall that problem PART IT ION is described as follow. Data: Finite set A of P r elements a1 , a2 , . . . , ar , with integer sizes s(ak ), ∀k, 1 ≤ k ≤ r, rk=1 s(ak ) = 2B. P Question: Is there a subset A1 of indices such that k∈A1 s(ak ) = P k∈{1,2,...,r}\A1 s(ak ) = B? Given an arbitrary instance of PART IT ION , we construct an instance of VIRM ACALLOC as follows: • n = m = r, • for each job Ji with i ∈ {1, 2, . . . , n}: si = 0, pi = 1, Ri = ai , • for physical machine Mj with j ∈ {1, 2, . . . , m}: vj = B, • y = 2. (⇒) Given a feasible solution to PART IT ION , we can define a solution to VIRM ACALLOC by assigning a virtual machine of M1 for a subset of jobs corresponding to A1 , the remaining jobs will be executed on virtual machines allocated from M2 . This assignment satisfies the conditions and the answer to problem VIRM ACALLOC is ’Yes’ since the exexcution time on M1 (and on M2 resp.) is 1. (⇐) If there exists a feasible solution to problem VIRM ACALLOC, then there exists a virtual-machine allocation such that all virtual machines are allocated from two machines at

242

most (due to y = 2 and all jobs start at the same time). Since all machines have the same capacity, assume that all jobs are served from the first machine M1 orPtwo first machines M1 and M2 . Due P r R to the fact that n i = i=1 ai = 2B, the first machine M1 i=1 with capacity B could not support all of jobs. So, we need two first machines to serve this job set. Moreover, the maximum number of virtual machines could be used from these two machines is v1 + v2 = B + B = 2B which corresponds exactly to the number of virtual machines needed. So, there is no redundant resource from these two machines. Hence, the total virtual machines that serve the jobs executed on M1 is equal to v1 = B and the sub-set of jobs defines then A1 . Consequently, the answer to PART IT ION is ’Yes’.



Constraint 1: vj m X X

xijkt = Ri , ∀i ∈ [1, n], ∀t ∈ [si , si + pi )

j=1 k=1 •

Constraint 2: vj n X X

xijkt ≤ vj , ∀j ∈ [1, m], ∀t ∈ [0, T ]

i=1 k=1 •

Constraint 3: n X xijkt ≤ 1, ∀j ∈ [1, m], ∀k ∈ [1, vmax ], ∀t ∈ [0, T ]



Constraint 4:

i=1

V.

M IXED INTEGER LINEAR MATHEMATICAL MODEL

There are many objectives that have been defined in literature such as minimizing the system execution cost, minimizing the number of physical machines needed, minimizing the number of physical machines executed per time unit,... (refers to [1], [4], [7], [9]). Mathematical models proposed in this paper use the below objective: Minimizing the total execution time on each machine.

A. Constraints Before building mathematical models for the problem, its constraints should be listed out here. Constraint 1: 1) The number of virtual machines assigned to job Ji must be exactly its requirement (Ri ) 2) At any time, total number of virtual machines used in physical machine (Ji ) does not exceed its capacity (vj ). 3) No virtual machine can execute 2 jobs at the same time. 4) Once a virtual machine is assigned for a job, that job must be executed continuously until complete on that virtual machine.

B. Objective function The objective used for the problem is about minimizing the total execution time on each physical machine. To model the objective function, intermediate variables are defined as below:  1 machine Mj is used at t yjt = , ∀j ∈ [1, m], ∀t ∈ [0, T ] 0 otherwise

xijkt = xijksi , ∀i ∈ [1, n], ∀j ∈ [1, m], ∀k ∈ [1, vj ], ∀t ∈ (si , si +pi ) and xijkt = 0, ∀i ∈ [1, n], ∀j ∈ [1, m], ∀k ∈ [1, vj ], ∀t ∈ / [si , si +pi )

D. Mathematical model 2 In this model, the t factor is removed from the decision variable. The decision variable is represented as below:  1 Job Ji is executed using kth VM on Mj xijk = , 0 otherwise ∀i ∈ [1, n], ∀j ∈ [1, m], ∀k ∈ [1, vmax ]. Then, the yjt variables can be calculated by: yjt =

m X T X

vj m X X



Constraint 2: vj X X

yjt



The decision variables in this model are defined as below:



The decision variable in this model is defined as:

Job Ji is executed using kth VM on Mj at time t , otherwise

 xij =

max

k 0

Job Ji is executed on machine Mj using k VM , otherwise

∀i ∈ [1, n], ∀j ∈ [1, m].

Then, the yjt variable can be calculated by: i∈[1,n],k∈[1,vj ]

Constraint 4: predetermined implicitly through decision variables definition.

E. Mathematical model 3

∀i ∈ [1, n], ∀j ∈ [1, m], ∀k ∈ [1, vj ], ∀t ∈ [0, T ].

yjt =

Constraint 3: X xijk ≤ 1, ∀j ∈ [1, m], ∀k ∈ [1, vj ], ∀t ∈ [0, T ] i∈(t)

C. Mathematical model 1

xijkt =

xijk ≤ vj , ∀j ∈ [1, m], ∀t ∈ [0, T ]

i∈(t) k=1

Note that value of the yjt variables will be calculated from decision variables in each following mathematical model.

1 0

xijk = Ri , ∀i ∈ [1, n]

j=1 k=1

j=1 t=0



xijk

or yjt ≥ xijk , ∀i ∈ (t), ∀k ∈ [1, vj ] as all of them are binary values. Constraints are then represented as below: • Constraint 1:

Then, the objective function can be formulated as: min

max i∈(t),k∈[1,vj ]

In this case, the yjt variables are calculated by:

xijkt , ∀j ∈ [1, m], ∀t ∈ [0, T ]

or yjt ≥ xijkt , ∀i ∈ [1, n], and ∀k ∈ [1, vj ] since they are in binary and the objective is relational to minimizing each value of yjt for all j ∈ [1, m], and for all t ∈ [0, T ]. Constraints are formulated as below:

xij > 0 ⇒ yjt = 1, ∀i ∈ [1, n], ∀j ∈ [1, m], ∀t ∈ [0, T ] or xij − yjt ≥ 0 where xij is integer number and yjt is binary number. And constraints are formulated as:

243



n

Constraint 1: X

xij = Ri , ∀i

10 10 20 20 30 30 40

j •

Constraint 2: X

xij ≤ vj , ∀j, t

i∈(t) • •

m

Constraint 3: predetermined implicitly Constraint 4: predetermined implicitly

Iterations 0 126.8 0 3575.1 2967.4 5738.3 886519.5

2 3 2 3 3 5 5

Average t (s) 0.028 0.107 0.042 0.721 1.024 1.995 180.136 Fig. 3.

# cuts 0 171.8 0 1607.3 2041.3 1677.4 104367.8

Iteration 0 1083 0 8066 6020 24696 6846602

Maximum t (s) 0.05 0.25 0.07 1.29 1.69 9.2 1436.92

# cuts 0 1057 0 4359 5030 4169 798533

CBC experiment result for model 3

VI. E MPIRICAL RESULTS To test the effectiveness of the models, we have used COIN-OR CBC (free version), a well-known solver, to implement proposed MILP models and compared their computational times. We have measured the performance of models on personal computer with the following configuration: Intel Core i5 2.50GHz, 4GB memory, run on Ubuntu 12.04 operating system. In this experiment, according to each couple (n, m) = (number of jobs, number of machines), the necessary data test set are generated as below: • Processing time pi is generated randomly by integer uniform distribution in [1, 3]; • Number of virtual machine required by a job Ri is generated randomly by integer uniform distribution in [1, 5]; • Starting time for a job si is chosen with equal probability for each value in range [0 − 9] • Number of virtual machine for a physical machine vj is generated randomly by integer uniform distribution in [1, 5]; For each (n, m) couple, 10 instances are generated, executed using COIN-OR CBC solver, and then below information are extracted from the result: • computational time that the solver needs to execute the model, • number of iterations the solver has gone through to archive the optimal solution, • total number of cuts the solvers has done using its cutting algorithms. Below tables show the average and maximum values accumulated from the solver output: n 10 10 20 20 30 30 40

m 2 3 2 3 3 5 5

Average Iterations t (s) 25.5 0.342 1263.3 0.723 160.1 0.688 10085.3 4.395 5625.8 4.918 34564.9 17.274 185068.2 66.622 Fig. 1.

n 10 10 20 20 30 30 40

m 2 3 2 3 3 5 5

Iterations 30.6 1653.5 100 4421.3 2182.3 7675.3 230861.4 Fig. 2.

# cuts 55.8 547.1 212.1 1296.8 1612.1 955.7 1353.1

Iteration 79 7798 520 30653 13309 291780 1209247

Maximum t (s) 0.65 1.62 1.07 12.00 8.77 84.23 311.25





VII. C ONCLUSION In this paper, we have considered the resource allocation problem in cloud computing where each physical machine in cloud has a lot of virtual machine and each job needs to use a number of virtual machines during a given and fixed period. The objective aims to minimize the cost induced by total execution time on each physical machine. We prove that this allocation problem is N P -hard. Three mixed integer linear mathematical models are then proposed to represent the considering optimization problem. The performance comparison of the three proposed models is analyzed through some empirical results. Further research can be undertaken to prove whether VIRM ACALLOC is weakly or strongly N P -complete. Another research direction should focus on improving solver performance with some optimization technical. Applying relaxation techniques and model-based heuristics for solving will be also an interesting research topic.

# cuts 141 2395 611 2685 3074 1792 4731

CBC experiment result for model 1

Average t (s) 0.288 0.703 0.567 2.396 2.507 7.963 78.821

# cuts 48.2 648.5 151.6 1024.3 868.8 753.4 2439.6

Iteration 12 7499 296 19254 11599 40446 1570957

Maximum t (s) 0.69 1.7 1.17 7.17 4.17 18.36 498.47

When the number of jobs increased, the third model is the most effective one to solve. It is because when eliminating parameters from the model, several constraints are removed and no new constraint is added. The complexity does not depend only on the number of jobs, but it depends on other parameters also. We can see this when seeing the much difference between the average time and maximum time the solver needs to solve models.

R EFERENCES [1] A. Beloglazov, J. Abawajy and R. Buyya, “Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing,” Future Generation Computer Systems, , vol. 28, no. 5, pp. 755-768, 2012. DOI: 10.1016/j.future.2011.04.017. [2] Garey M.R. et Johnson D.S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman & Company, San Francisco. ´ Goiri, F. Juli`a, R. Nou, J. Berral, J. Guitart, and J. Torres, “Energy[3] I. aware Scheduling in Virtualized Datacenters,” IEEE International Conference on Cluster Computing (CLUSTER 2010), pp. 58–67, 2010. [4] B. Speitkamp and M. Bichler, “A Mathematical Programming Approach for Server Consolidation Problems in Virtualized Data Centers,” IEEE Transactions on Services Computing, 3 (4) pp. 266– 278, 2010.

# cuts 122 2669 321 2385 2929 1867 16635

CBC experiment result for model 2

[5] Chandra Chekuri, S. Khanna, “On Multi-Dimensional Packing Problems,” SODA 1999, pp. 185–194, 1999. [6] R. Panigrahy, K. Talwar, and L. Uyeda, “Heuristics for Vector Bin Packing,” research.microsoft.com, 2011. [7] B. Sotomayor, K. Keahey, and I. Foster, “Combining batch execution and leasing using virtual machines,” HPDC ’08 Proceedings of the 17th international symposium on High performance distributed computing, pp. 87–96, 2008. DOI: 10.1145/1383422.1383434 [8] B. Sotomayor, “Provisioning Computational Resources Using Virtual Machines and Leases,” PhD Thesis submited to The University of Chicago. US, 2010.

We can conclude the following from the experiment result:

244

[9] N. Q. Hung, N. Thoai, and N. T. Son, “Performance constraint and power-aware allocation for user requests in virtual computing lab,” Journal of Science and Technology (Vietnam), vol. 49, 4A, no. Special on Int. Conf. on Advanced Computing and Applications (ACOMP2011), pp. 383–392, 2011. [10] J. R. Correa and M. R. Wagner, “LP-based online scheduling : from single to parallel machines,” Math. Program., Ser. A (2009), vol. 119, pp. 109–136, 2009. [11] A. Schulz, “Scheduling unrelated machines by randomized rounding,” SIAM Journal on Discrete Mathematics, vol. 15, no. 4, p. 450, 2002. [12] I. Al Azzoni, “Power-Aware Linear Programming Based Scheduling for Heterogeneous Computer Clusters,” Future Generation Computer Systems, vol. 28, no. 5, pp. 745–754, May 2011.

245