A Strength Pareto Evolutionary Algorithm for Live ...

A Strength Pareto Evolutionary Algorithm for Live Migration of Multiple Interdependent Virtual Machines in Data Centers Tusher Kumer Sarker and Maolin Tang School of Electrical Engineering and Computer Science Queensland University of Technology 2 George Street, Brisbane, QLD 4001, Australia {t.sarker, m.tang}@qut.edu.au

Abstract. Although live VM migration has been intensively studied, the problem of live migration of multiple interdependent VMs has hardly been investigated. The most important problem in the live migration of multiple interdependent VMs is how to schedule VM migrations as the schedule will directly affect the total migration time and the total downtime of those VMs. Aiming at minimizing both the total migration time and the total downtime simultaneously, this paper presents a Strength Pareto Evolutionary Algorithm 2 (SPEA2) for the multi-VM migration scheduling problem. The SPEA2 has been evaluated by experiments, and the experimental results show that the SPEA2 can generate a set of VM migration schedules with a shorter total migration time and a shorter total downtime than an existing genetic algorithm, namely Random Key Genetic Algorithm (RKGA). This paper also studies the scalability of the SPEA2. Keywords: Live VM migration · scheduling · migration time · downtime · strength pareto evolutionary algorithm

1

Introduction

The emergence of virtualization technology and live Virtual Machine (VM) migration technology has brought enormous benefits to data centers. Through virtualization a number of VMs can be created on the top of a Physical Machine (PM). Since the VMs are logically independent from the PMs on which those VMs are created, a VM can be migrated from one PM to another while the VM is running, which is called live VM migration. Live VM migration technology enables many new online management and maintenance activities, such as server consolidation [1], load balancing [2], and proactive fault tolerance [3]. Although the problem of live migration of a single VM has been studied intensively, the problem of live migration of multiple interdependent VMs has hardly been investigated. The performance of live migration of multiple VMs is measured by the total migration time and the total downtime of the VMs, both

of which depend on the scheduling of the multi-VM migrations. The total migration time is the time difference between the time when the last VM migration is completed and the time when the first migration is initiated, and the total downtime is the sum of the time during which a VM remains in a down state internally. A pioneering research on the live multi-VM migration scheduling problem was done by Ghorbani et al., who proposed a heuristic algorithm to solve the VM sequence planning problem [4]. In their work, they found a migration order for a given set of interdependent VMs to their designated target PMs in a bandwidth constrained network so that maximum number of VMs are migrated without violating the link bandwidth capacity. However, they did not provide detailed experimental results, such as total migration time and total downtime in their paper. Nus et al. [5] developed and investigated several VM migration scheduling algorithms for minimizing the total migration time for migrating multiple identical VMs. They did not consider the network topology and assumed that each migration took a predefined amount of migration time. However, VM migration time varies with the bandwidth and deployed VMs in a data center are heterogeneous. The problem of scheduling multiple heterogeneous interdependent VMs were initially tackled in [6, 7]. In [6], a heuristic algorithm was proposed to schedule multi-VM migrations. The strategy used by the heuristic algorithm was to give the highest priority to the VM that would result in the least increase in the network congestion incurred by the data flow between the VM and its dependent VMs after migrating the VM. A Random Key Genetic Algorithm (RKGA) was proposed in [7] to provide a migration schedule with a minimum combined minimum total migration time and the minimum total downtime. Since the multi-VM migration scheduling problem is a multi-objective optimization problem, by its nature, this paper attempts to solve it using an evolutionary Pareto-based multi-objective optimization algorithm, namely Strength Pareto Evolutionary Algorithm 2, or SPEA2 [8]. The remainder of this paper is organized as follows. Section 2 formulates the problem. Section 3 details how to design a SPEA2 for the multi-VM migration scheduling problem. The evaluation of the SPEA2 is conducted in Section 4. Finally, the research work is concluded in Section 5.

2

Problem Statement

The multi-VM migration scheduling problem refers to finding a sequence of VM migrations for a given set of interdependent VMs in a data center such that both the total migration time and the total downtime are minimized. A data center, GD = (N, L), consists of a set of nodes, N , including PMs, P , and switches, W , and a set of communication links, L, which interconnect these PMs and switches. The weight on a link li,j ∈ L represents the bandwidth capacity of the link. A PM is characterized by its resources. Two types of resources are considered in this research – CPU and memory. The interdependency among the VMs is represented by a weighted graph, GV = (V, E). An undirected edge

ei,j ∈ E between two VMs, vmi ∈ V and vmj ∈ V , represents the inter-VM dependency and the weight on the edge, wi,j , is the amount of traffic flow rate between them. The migration of vmi , from pmj ∈ P to pmk ∈ P , represented by a 3-tuple, hvmi , pmj , pmk i, requires that the available bandwidth between pmj and pmk , bj,k , must be enough to carry the inter-VM traffic flow between pmj and pmk . Moreover, for the migration tuple, the CPU and memory requirements of vmi , denoted by Cvmi and Mvmi , respectively, must be met by the respective available resources at pmk , Cpmk and Mpmk . Therefore, the migration of vmi to pmk is subject to the following resource constraints: Cvmi ≤ Cpmk Mvmi ≤ Mpmk wi,j ≤ bj,k

(1)

As this research problem deals with the migration of multiple VMs and multiple VM migrations can be performed in parallel, the time for migrating a set hi , is the time when the first VM in hi starts of parallel VMs, hi , denoted as Tm migrating to the time when the last VM in hi finishes migrating. The total migration time, Tm , is the time required to complete all the VM migrations, which is ultimately the summation of the migration times of q sets of parallel VMs, where q is the total number parallel VMs. Tm =

q X

hi . Tm

(2)

i=1

The total downtime, Td , is the summation of downtimes experienced by mii grating each VM vmi , tvm . Thus, the total downtime for migrating n VMs is d given by (3). The migration time and downtime of a VM is calculated according to the procedure given in [7]. Td =

n X

i tvm . d

(3)

i=1

Given a set of VM migrations and the inter-VM dependency between the VMs, GV , in a data center, GD , the multi-VM migration scheduling problem is to find a sequence of parallel VM migration sets, hh1 , h2 , . . . , hn i, such that both Tm and Td are minimized.

3

The SPEA2

The SPEA2 generates a set of nondominated solutions. A solution is defined as a schedule of VM migrations. The quality of a solution, Xi , is determined by an objective vector, hg1Xi , . . . , glXi , . . . , grXi i, where glXi is an objective value. In minimization problem solution, Xi , dominates solution, Xj , i.e. Xi Xj , if X X ∀l, glXi ≤ gl j and ∃l, glXi < gl j . In this research, the objective vector is hTm , Td i. The SPEA2 works in four steps: (1) fitness calculation; (2) environment selection; (3) mating selection; and (4) genetic operations. These four steps iterate

Algorithm 1 The SPEA2 for scheduling VMs migration 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

randomly generate an initial population initialize the archive to an empty set while termination criteria are false do for each individual in the population do find the schedule of migrations as described in section 3.1 calculate the total migration time and total downtime end for calculate the fitness values of all individuals in the population and archive do environment selection do mating selection to fill mating pool apply crossover and mutation operators end while output the set of nondominated solutions

until a certain termination condition is met. In fitness calculation, the solutions in the current archive and current population are evaluated with respect to their objective vectors, hTm , Td i. The environment selection updates a fixed size archive with the solutions from the current archive and current population. In this regard, the archive is filled up by the nondominated solutions first. If the number of nondominated solutions is less than a pre-defined archive size, then the rest of the archive is filled up by the dominated solutions with better fitnesses. On the contrary, if there are more nondominated solutions than the archive size, then archive truncation is required. In the archive truncation procedure, the nondominated solutions in the dense area are considered. A solution which has minimum Euclidean distance to another solution is chosen for removal, and if more than one solution exist with the same minimum distance then the second nearest neighbour is chosen and so forth. The removal process iterates until the size of the archive is not truncated to the pre-defined size. The mating pool selection fills the mating pool by the better solutions in the current archive and current population. Genetic operators are applied on the solutions in the mating pool to get a new generation. Algorithm 1 is the pseudocode of SPEA2. 3.1

Chromosome Representation

A chromosome represents a schedule of migrations and the random-key presentation [7, ?] is used to obtain such a schedule. A random number in the range [0, 1] is generated against each gene, where each gene corresponds to a VM. The VMs are prioritized for migration from the lowest to the highest value of random numbers, and VMs with the same random number get consecutive migration priorities. The VMs are scanned from the highest to the lowest migration priorities to check the migration feasibilities to their target PMs. The VMs that satisfy the resource constraints as specified in (1) during a scan is scheduled to migrate in parallel. The scan procedure iteratively schedules all the migrations.

3.2

Genetic Operators

The crossover and mutation operators used in [7] are adopted in the SPEA2. In the crossover, an offspring is produced from two randomly selected parents in the mating pool. A random number in the range [0, 1] is generated against each gene and if the value of this random number is below a predefined value then the gene from the second parent is chosen; otherwise the gene of first parent is selected. In the mutation, a gene of an arbitrary chromosome is altered with 20% probability to be changed by the new value in the range [0, 1]. 3.3

Fitness Calculation

The fitness value of a solution, Xi , is determined by the strengths of its dominators. The strength of Xi , S(Xi ), refers to the number of solutions it dominates in both the archive, At , and the population, Pt , i.e. S(Xi ) = |{Xj | Xj ∈ At ∪ Pt , Xi Xj }| .

(4)

The symbol denotes the dominance relation. The raw fitness value of Xi , R(Xi ), is the added strengths of the solutions that dominate Xi . R(Xi ) =

X

S(Xj ) .

(5)

Xj ∈At ∪Pt ,Xj Xi

Equation (5) indicates that the raw fitness values of the solutions that do not dominate each other are zeros. Hence, density information is taken into account to calculate actual fitness value. In density estimation technique, the Euclidean k distance of the k-th nearest neighbour of Xi is calculated and denoted as σX . In i our experiments we set k = 1. Then density corresponding to Xi is normalized as D(Xi ) = σk 1+2 and added to R(Xi ) to obtain the actual fitness value of Xi Xi

as follows: F (Xi ) = R(Xi ) + D(Xi ), Xi ∈ At ∪ Pt .

(6)

Equation (6) indicates that a better solution gets lower fitness value.

4

Evaluation

We show the effectiveness of the SPEA2 for different size test problems. A test problem size is defined by the number of VMs designated for migration. We conducted the experiments for test problems of migrating between 20 VMs and 200 VMs with an increment of 20 VMs. As it is quite difficult to perform such large scale experiments in real data centers, we performed the simulation experiments. We evaluate the SPEA2 by comparing through the RKGA [7]. In the following subsections, we describe the simulation setup and experimental results.

4.1

Simulation Setup

In our experiments, we simulated a data center comprising 686 PMs, connected through the CLOS topology, and 1000 VMs. The capacity of each link was 1 Gbps. The virtual network consisted of several VM clusters with the maximum cluster size of 5 VMs and the traffic flow rate between two dependent VMs was randomly picked up from the set {1, 2, 3, 4, 5} Mbps. For each test problem the VMs were arbitrarily chosen from a pool of 200 VMs. Each PM was configured with 50 CPU and 64 GB memory. The attributes of a VM were arbitrarily chosen from the Amazon EC2 instance types [9]. Memory modification rate of a VM was in the range of [1, 100] Mbps. In all experiments the simulation setup, i.e. the number of PMs and VMs, the CPU and memory capacities of each PM and VM, the link capacities and the initial placement of VMs remained unchanged. The parameters for the SPEA2, i.e. the archive size, population size, mating pool size and maximum number of generations were respectively 40, 100, 100 and 150; and 90% and 10% of offspring were reproduced by the crossover and mutation operations respectively. This set of parameters was chosen through trials. The SPEA2 terminated when either 150 generations were explored or when there was no improvement in 50 consecutive generations. Both the SPEA2 and the RKGA were implemented in Java, and all the experiments were conducted on a 2.80 GHz Intel Core i7-2640M CPU and 8.00 GB RAM desktop computer. 4.2

Experimental Results

We compared the solutions generated by the SPEA2 and the RKGA for each of the test problems through the relative dominance of their solutions. As the SPEA2 provides a set of nondominated solutions unlike the RKGA, which provides only one solution, we evaluated the relative dominance between them. We define the relative dominance as the number of solutions of the SPEA2 dominate the solution of the RKGA and vice versa. Therefore, this criterion has been chosen for comparison to show the superiority of the SPEA2 over the RKGA and vice versa. Due to stochastic nature of the SPEA2 and the RKGA, 10 experiments were conducted for each test problem and the average of this 10 runs was taken for comparison. Comparison of Relative Dominance of the SPEA2 and the RKGA: Table 1 shows the relative dominance between the SPEA2 and the RKGA. For each test problem the results presented in Table 1 is the average of 10 runs. The experimental results show that some of the nondominated solutions generated by the SPEA2 dominate the solution generated by the RKGA for all test problems starting from 40 VMs. For smaller size test problems (20 VMs) as small number of solutions are generated by the SPEA2, it gets less chance that solutions in the SPEA2 dominate that in the RKGA. However, for this test problem some solutions in the SPEA2 overlap the solution in the RKGA, i.e. some solutions in the SPEA2 are identical to that of the RKGA. The experimental results also illustrate that the solution in the RKGA does not dominate any solution

Table 1: Relative dominance between SPEA2 and RKGA No. of No. of solutions of No. of solutions of No. of #VM solutions solutions SPEA2 dominating RKGA dominating (SPEA2) (RKGA) solutions of RKGA solutions of SPEA2 20 6 1 0 0 40 8.8 1 0.2 0 60 10.5 1 0.4 0 80 8.9 1 0.7 0 100 9.9 1 0.2 0 120 8.5 1 2.2 0 140 6.8 1 1.9 0 160 8 1 3.2 0 180 10.2 1 2.9 0 200 9.4 1 4.6 0

in the SPEA2 for any test problem. This concludes that no solution in the SPEA2 is worse than that of the RKGA and for larger test problems the SPEA2 outperforms the RKGA in terms of their relative dominance. Comparison of Solution Qualities of the SPEA2 and the RKGA: The quality of a solution is measured by the total migration time and total downtime. The comparison graphs for total migration time and total downtime are respectively shown in Fig. 1 and Fig. 2. Albeit the solutions generated by the SPEA2 dominate the solution in the RKGA, the average total migration time calculated for each test problem in the RKGA is less than that of the SPEA2. The reason for that, for each test problem a significant number of solutions produced for the SPEA2 by 10 experiments, for example, for test problem of size 200 VMs 94 solutions were generated from 10 runs, and for the RKGA only one solution is generated in each run. Therefore, the average total migration time calculated for the SPEA2 becomes more than that of the RKGA. Moreover, the solution set of the SPEA2 contains a wide range of nondominated solutions, and some of these nondominated solutions are extremely diverse, i.e. total migration time is very large compared to total downtime. This gives the chance of minimizing the average total downtime with an increase of total migration time; and Fig. 2 depicts this scenario where the SPEA2 shows slight improvement over the RKGA in terms of total downtime. Both the graphs of total migration time and total downtime, show the near linear trend with the increased number of migrating VMs indicating good scalability of the SPEA2. The fluctuations are observed due to heterogeneity of VMs, as different configuration of VMs result in different migration time and downtime.

5

Conclusion

In this paper, we have developed a SPEA2 for the multi-VM migration scheduling problem. We have also evaluated the SPEA2 through by experiments. The ex-

3000

SPEA2 RKGA

SPEA2 RKGA

2500

3000

Total downtime (ms)

Total migration time (s)

3500

2500

2000

2000

1500

1000

1500 500

20

40

60

80

100

120

140

160

180

200

#VM

Fig. 1: Total migration time graphs

20

40

60

80

100

120

140

160

180

200

#VM

Fig. 2: Total downtime graphs

perimental results have revealed that the SPEA2 produces better solutions than the RKGA, an existing genetic algorithm for the multi-VM migration scheduling problem. In addition, the experimental results have demonstrated the good scalability of the SPEA2.

References 1. Wood, T., Shenoy, P., Venkataramani, A., and Yousif, M.: Sandpiper: Black-box and Gray-box Resource Management for Virtual Machines. Computer Networks, Elsevier, 53, 2923–2938 (2009) 2. Bobroff, N., Kochut, A., and Beaty, K.: Dynamic Placement of Virtual Machines for Managing SLA Violations. Proceedings of the 10th IFIP/IEEE International Symposium on Integrated Network Management, Munich, 119–128 (2007) 3. Engelmann, C., Vallee, G. R., Naughton, T., and Scott, S. L.: Proactive Fault Tolerance Using Preemptive Migration. Proceeding of the 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, Weimar, pp. 252–257 (2009) 4. Ghorbani, S. and Caesar, M.: Walk the Line: Consistent Network Updates with Bandwidth Guarantees. Proceedings of the 1st Workshop on Hot Topics in Software Defined Networks, Helsinki, pp. 67–72 (2012) 5. Nus, A. and Raz, D.: Migration Plans with Minimum Overall Migration Time. Proceedings of the 2014 IEEE/IFIP Network Operations and Management Symposium, Krakow, pp. 1–9 (2014) 6. Sarker, T. K. and Tang, M.: Performance-driven Live Migration of Multiple Virtual Machines in Datacenters. Proceedings of the 2013 IEEE International Conference on Granular Computing, Beijing, pp. 253–258 (2013) 7. Sarker, T. K. and Tang, M.: A Random Key Genetic Algorithm for Live Migration of Multiple Virtual Machines in Data Centers. Proceedings of the 21st International Conference on Neural Information Processing, Kuching, pp. 212–220 (2014) 8. Zitzler, E., Laumanns, M. and Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization. Proceeding of the EUROGEN2001 Conference, Athens, pp. 95–102 (2001) 9. http://aws.amazon.com/de/ec2/instance-types/