Nevertheless, VM migration is inevitable when the server is ... Cloud platform providers are responsible for satisfying the various resource demands by ...
A Two-level Virtual Machine Self-reconfiguration Mechanism for the Cloud Computing Platforms Wei Chen, Xiaoqiang Qiao, Jun Wei, Tao Huang Institute of Software Chinese Academy of Sciences Beijing, China {wchen, qiaoxiaoqiang, wj, tao}@otcaix.iscas.ac.cn Abstract—Cloud computing is a new model and technology that leverage the efficient pooling of on-demand, self-managed virtual infrastructure. Virtualization packages the applications in the form of Virtual Machine (VM) and provides significant benefits by reconfiguring the VMs dynamically. VM reconfiguration is hard and complicated, and existing work addressed the problem with diverse objectives by answering the questions of when to reconfigure, which VMs should be reconfigured and where to host the VMs. However, we found that the runtime reconfiguration affects the total costs significantly. Then we propose a two-level runtime reconfiguration mechanism to automate the operations with the objective of minimizing the costs. The mechanism includes the local adjustment and the parallel migration. Employing the local adjustment, VMs on a same server can be reconfigured in a time-division multiplexed way based on the load trend prediction, which can avoid the unnecessary VM migrations. Nevertheless, VM migration is inevitable when the server is overloaded. Considering the conflict between reducing the migration cost and minimizing the performance interference, we propose a VM parallel migration strategy and map it to the max matching problem of the bipartite graph. We implement a framework based on Xen and evaluate the mechanism with a preliminary experiment. The results show that this two-level self-reconfiguration mechanism is effective in reducing the VM runtime reconfiguration costs. Keywords-cloud computing; self-reconfiguration; migration
I.
virtual
machine;
INTRODUCTION
Cloud computing [1] is popular as a rising application paradigm, where software, platforms and infrastructures are provided and shared as services. One important impetus of cloud computing is virtualization technology [2], which virtualizes the resources (e.g. CPU, memory, etc.) and enables multiple applications to run on the Virtual Machines (VMs) instead of the Physical Machines (PMs). In a cloud computing environment, such as a virtualized data center, VM is the basic deployment and management unit. Cloud platform providers are responsible for satisfying the various resource demands by determining where to place VMs and how to allocate the resources. The virtualization-based cloud computing can improve the resource utilizations, scalabilities, flexibilities and availabilities of applications. Also it can provide good application isolations. Due to these advantages, large-scale distributed applications are preferred to be hosted in the cloud platforms. VM reconfiguration enables the flexible resource allocations and the efficient resource utilization with
different operations, including VM adjustment [3] [4], live migration [5] and consolidation [6]. VM adjustment changes the resource allocations of VMs dynamically. Live migration focuses on the overloaded VMs by transferring them from the current PMs to the ones having sufficient resources in extremely short time. Consolidation, aiming to reduce power consumption and to maximize the resource utilization, is employed to reduce the number of PMs by moving several VMs in different PMs to a same one. All these operations can be performed at runtime without impacting their performance significantly. However, VM reconfiguration is difficult and complicated due to many factors. First of all, there are hundreds and thousands of applications residing on a cloud platform, and their varying resource demands and the fluctuant workload make the work harder. Secondly, many decisive factors, including the multiple resource capacities, SLA violation penalties, power consumptions, hotspot dissipation and other management costs make the work complicated. As a result, it is infeasible to reconfigure VMs manually, and the effective and automatic approach is required. As a promising technique, VM reconfiguration has attracted considerable interest in recent years, and there is some work devoted to addressing the problems with various objectives, including maximizing the resource utilization, reducing power consumption, and minimizing VM migration costs. Initially, some work just considered the resource utilizations and the workload, using ordering algorithm [7], constraint programming [6] and genetic algorithm [8] to get solutions. Recently, some factors relating to providers’ costs (e.g. Power consumption, thermal dissipation and operation costs, etc.) were considered. Temperature-aware workload placements were presented in [9] and [10]. Work in [11] simultaneously considered workload, power consumption and thermal management when performing runtime reconfiguration. All the work reconfigures VMs at runtime by addressing the key issues of when to do reconfigurations, which VMs should be reconfigured and where do the VMs migrate to if necessary. However, besides these issues, we find that how to perform the reconfiguration is also important. When there are multiple VMs have to be moved, the different migration solutions spend very different time and have very different impacts on the other applications. Compared with VM adjustment,live migration can bring additional cost in terms of the application performance degradation. In order to reduce the total costs of the VM reconfiguration, we focus
MOTIVATION
We set up a simulated environment and perform a preliminary experiment to illustrate the effect of the VM runtime reconfiguration. The VM-based computing platform is constituted of several server nodes. Each node is configured with quad-core 2.4GHz processors and 8G RAM, and these nodes are connected with gigabit Ethernet. We employ three types of applications, CPU-intensive (CI), Memory-intensive (MI) and Network I/O-intensive (NI), with multiple instances. Some VM templates are provided and allocated to these applications. TABLE I, II and III depict the use of the servers, the VM template configurations and the applications respectively. TABLE I. ID S1 S2 S3 S4-S7
ID V1 V2 V3 V4
Name Common High-CPU High-Mem High-I/O
TABLE III. App. Type CPU-Intensive Mem-Intensive I/O-Intensive
2.0 1.8 1.6 1.4 1.2 1.0 .8 .6 0
THE VM TEMPLATES Configuration 0.5*2.4GHZ, 1GB RAM, 10M I/O 2*2.4GHZ, 1.5GB RAM, 20M I/O 1*2.4GHZ, 3GB RAM, 20M I/O 1*2.4GHZ, 1GB RAM, 30M I/O
THE APPLICATION AND THEIR ALLOCATIONS App ID CI-1 CI-2 MI-1 NI-1
VM Template V2 V2 V3 V4
Instance number 2 2 2 2
CI-1 CI-2 MI-1 NI-1
2.2
USE OF THE SERVERS
Use File system Client workload generator for the applications Request router, distributing client requests Hosting the applications packaged into VMs TABLE II.
2.4
Server ID S4, S5 S4, S5 S4, S5 S4, S5
Among the applications, CI-1 and CI-2 need much more CPU to support their tasks, including Pi calculation, k-means and other complex computations. MI-1 employs a lot of
100
200 300 400 time (second)
500
(a) Local adjustment 2.4
CI-1 CI-2 MI-1 NI-1
2.2 response time (second)
II.
network I/O to perform intensive message transmission, and NI-1 needs large amount of memory to cache data. All these applications have two instances, and they are distributed on the different servers. There are also some other applications running on S6 and S7, which makes the servers not in the idle state. We conduct the experiment with the following steps: Make the applications run for a period of time and generate the peak workload for each application at the different time with the different durations. Increase the workload of CI-1 and CI-2, and make them overloaded. Use the local adjustment and the VM migration to reconfigure the platform respectively, and compare their effects to applications in terms of the performance. We reconfigure the applications with different operations and provide more CPU resources to the overloaded ones. The experimental results are shown in Fig. 1. We use the application instances hosted on S4 as representatives. Figure 1 (a) shows the result of the local adjustment, and Fig. 1 (b) is the result of migrating CI-1 to S6 and S7. We observe that, the application performance degradations during the local adjustment are smaller than that during VM migration, and the time of performance degradations is also shorter.
response time (second)
how to reduce VM migration and minimize the total migration time. In this paper, a large-scale virtualization-based data center is assumed, and we concentrate on how to reduce the reconfiguration cost by employing a two-level VM self-reconfiguration mechanism. In this mechanism, the first level is the VM local adjustment, which reconfigures the VM resource capacities based on time division multiplexing to maximize the resource utilization and to avoid unnecessary VM migrations. The second level is a parallel migration strategy, which reduces the total migration time and makes the trade-off between reducing the migration time and minimizing the performance interferences. The rest of this paper is organized as follows. In Section II, We present the motivation based on some examples. Then we analyze the problem in Section III. In the next two Sections IV and V, the details of the two-level runtime self-reconfiguration mechanism are presented. Section VI gives an architectural overview. Experiments and evaluations are shown in section VII. We introduce the related work and give a final conclusion in section VIII and IX.
2.0 1.8 1.6 1.4 1.2 1.0 .8 .6 0
100
200 300 400 time (second)
500
(b) Reconfigure using VM migration Figure 1. Runtime reconfiguration when some VMs are overloaded
The local adjustment avoids the unnecessary VM migration by reallocating the spare resources from the light-loaded VMs to the overloaded ones in very short time. In contrast, although VM migration can solve the problem, it spends much more time to do VM transmission, which generates significant interferences with the other applications. Meanwhile, compared with the memory intensive application
MI-1, the network intensive application NI-1 receives more interference when doing the VM migration, which because much of the network I/O is preempted by the migration. However, local adjustment cannot deal with all the cases, particularly the resource contentions within a server. In such situations, the server is overloaded and the VM migrations are inevitable. If there are multiple VM instances to migrate, the parallel migration has been proven as an effective way to shorten the migration time [12]. The experiment in [12] shows that “parallelizing the migration of two VMs from the different source nodes to the different target nodes would shorten the total migration time significantly”. On the other hand, if we simultaneously move multiple VMs from a same server to a same target, many network I/O of these two nodes will be utilized, which may generate significant performance interferences to the other VMs [13]. Also the experiment in [12] shows that multiple VMs migrating simultaneously is almost equal to moving one by one in terms of the total time. Therefore, the ideal migration method is: 1) among the cloud platform, migrate multiple VMs from the different source nodes to the different targets in parallel. 2) Within a server, the VMs are moved in sequence. Motivated by the experiment and the one made in [12], we try to optimize runtime VM reconfiguration from the following aspects. According to the resource demands and the workload trends, adjust the VM capacities locally and avoid unnecessary VM migrations based on time division multiplexing. If migration is necessary, reduce the total operation time and the performance interference by moving multiple VMs in a parallel way. III.
PROBLEM ANALYSIS
In a typical cloud computing scenario, such as Amazon EC2 and the NewServers (NS) cloud platform, the Platform Providers (PPs) offer the computing resources in form of VM templates with various capacities and charge rents. The Service Providers (SPs) provide their own application services and rent the platform resources in the pay-per-use way. PPs get revenues from the tenants (SPs) and afford the costs of maintaining the platform. = − (1) As shown in Eq. (1), PP’s profit (Pp) comes from the difference between the revenue (Revp) and the total costs (Cosp). However, once the tenants declare their resource demands, the PP’s revenue is fixed. Therefore, reducing the total costs becomes the only effective way to increase the profits. Equation (2) depicts that PPs’ total costs come from several aspects, including the physical resources (Cosres), the VM reconfiguration (Cosm), the power consumption (Cospow) and the thermal dissipation (Costh). = + + + (2) Based on the experimental results in Section II, it is found that the VM reconfiguration cost (Cosm) mainly comes from the VM migration. VM migration can impact the applications residing on the source and the target nodes in
terms of the performance degradation, which is because the much more CPU and network I/O are preempted by such operations, and the other applications cannot get sufficient resource. Furthermore, if the VM migration is performed longer than the duration of the peak load, the method is useless and SLA penalty will be incurred. From this viewpoint, the VM migration must be performed as short as possible. On the other hand, the applications in a cloud computing environment have dynamic and various resource demands and the workload. However, the projected resource demands are specified based on the off-line estimations with their peak workload. At most of the time, the applications are not in the heavy-loaded states, and they do not use the resources allocated to them fully. Therefore, the runtime VM reconfiguration is applicable to deal with some overloaded cases, where the key issue is how to reconfigure VMs without affecting the other applications' performance. Based on the above analysis, we formulate the runtime VM reconfiguration costs: =
+
(3)
Total migration time of VMs is usually an important metric of the migration cost [6]. In Eq. (3), memvm is the VM RAM size, and RNB is the residual network bandwidth. memvm divided by RNB represents the migration cost in terms of the time of transferring a VM from the current PM to another one. Cosadj is the cost of VM local adjustment. In practice, since the local VM capacity adjustment can be done in the extremely short time, whose impact on the application performance is very small, the local adjustment cost (Cosadj) can be ignored. Therefore, a conclusion can be drawn that the reconfiguration cost mainly comes from VM migration, which is shown in Eq. (4). =
(4)
According to Eq. (3), in order to minimize the total reconfiguration costs, we try to deal with the overloaded applications by employing the local adjustments as much as possible. If the migration is inevitable, we try to reduce the total operation time. We assume that there are K applications partitioned and packaged into M (K ≤ M) VMs. These VMs reside on a platform constituted of N PMs. After running for some time, L VMs are overloaded, and the VMs must be reconfigured to maintain their performance. The objective of our work is to minimize the total reconfiguration cost when L overloaded VMs exist, which is shown in Eq. (5). (5) Min ∑ Cos , In addition, there are some constraints should not be violated when optimizing the VM reconfigurations. Rmem, Rcpu and Rio are the resource demands of a VM, Tmem, Tcpu and Tio are the total capacities provided by a server. Di,j is the element of the VM placement matrix, denoting whether the ith VM resides on the jth PM or not. If ith VM is on the jth PM, Di,j is 1, otherwise 0. ∀j ∈ N ∑ ∈ R D , ≤T ∀j ∈ N ∑ ∈ R D , ≤ T
∀j ∈ N ∑ ∈ R D , ≤ T ∑ D = 1 i ∈ [1, … , M] The first three constrain the total resource demands not to exceed the server capacities, and the last one ensures each VM is allocated to one and only one server. The key issue in migrating VM is: how to make the trade-off between shortening the total migration time and minimizing the performance interferences. VM migration solutions spend very different time, and the performance interferences are various due to the CPU and network I/O contention. Therefore, the key point is finding an appropriate way to migrate multiple VMs in parallel without interfering with the performance of the other applications significantly. Local adjustment and live migration constitute the two-level VM self-reconfiguration mechanism. We discuss the details of each level in the next two sections. IV.
LOCAL ADJUSTMENT
Local adjustment is the preferred operation due to its negligible cost. We employ time division multiplexing based resource allocation and workload distribution together to satisfy the resource demands of the overloaded VMs. The method is proposed based on the following intuitions. 1) The projected resource demands of the applications are specified based on the off-line estimations with their peak workload. In practice, the workload is time varying and the applications will not be always in the heavy-loaded states. 2) To improve the availability and other service qualities, there may be some applications having multiple instances on several distinct servers. In order to maintain the application performance, it is feasible to distribute part of the workload to each instance proportional to the additional resources reallocated to them. The key issues in employing the local adjustment include: 1) which VMs can be the ‘creditors’ lending out the residual resources, and 2) how much resource can be lent. The light-loaded VMs, whose peak load doesn’t overlap with the ones of the current overloaded ones, can be the ‘creditor’ candidates. Thus, the overloaded VMs, as the ‘debtors’, can get resources from them. The more spare resources the VM has, the more can be lent. As a result, it is the key to predict the resource utilization trends of all the VMs and the resource demands of the overloaded ones. According to the predictions, we can adjust the VMs capacities based on time division multiplexing. A. Monitor the resource utilization and the performance The prediction is based on the system monitoring. We use Xen [21] to implement our monitor mechanism, and a lot of runtime information can be obtained from the Xen hypervisor or the specific VM, domain 0. If some VMs run out of their resources (CPU, network or memory, etc.) with significant and persistent performance degradations, the VM reconfiguration should be performed. There are a lot of monitoring strategies proposed in existing work. T. Wood proposed the black box and the gray
box monitoring methods [5], where the resource utilization profile and the time-series profile are created based on the interval sample and measurement. Then the peak resource demands and the resource utilization trends can be estimated based on these profiles and data. We use such monitoring technique, and the details can be found in [5]. B. The resource utilization trend prediction When the condition is detected, we must identify the creditors and the debtors, and then estimate the resource demands and the total resources can be reallocated. We draw two conditions for selecting the suitable creditor. 1) Whose time period of the peak load doesn't overlap that of the current overloaded one, or 2) It has residual resources even with its peak workload. We explain the above conditions based on Fig. 2. There are four VMs residing on the same server, and now V1 is overloaded. From Fig. 2, we can see that V3 cannot be the creditor, which is because its peak workload will arrive and overlap with that of V1. If we reallocate the residual resource from V3 to V1, another reallocation has to be made soon after due to V3 is lacking in the resources with its peak workload. In the end, the frequent adjustments incur the much more performance interferences. On the other hand, it is found from Fig. 2 that V2 and V4 are the creditor candidates of V1. Although V3’s resource utilization will exceed the threshold (Uh), it does not overlap that of V1, and the resources can be returned after V1’s peak load finishes. V4 always leads to underutilization of the resources even with its peak workload, and many resources can be reallocated.
Figure 2. Resource utilization trend
Based on the historical monitoring data, we predict the peak load overlap probabilities of the current overloaded VM with the others with Bayesian theory. If the overlap probability doesn't exceed a threshold P, we can choose the VM as a creditor candidate. Furthermore, we must estimate the maximum resource demands of each VM based on historical monitoring data by leveraging the prediction method proposed in [14]. C. The local adjustment algorithm The algorithm is as follows. Firstly, predict 1) the total resource demands and the peak workload duration of the overloaded VM, 2) the resource utilization trends of the other VMs on the same servers, and 3) the next peak loads of the other VMs. Secondly, estimate the total resources can be reallocated from the light-loaded VMs. Thirdly, adjust the resource capacities of VMs on these servers. We reallocated the residual resources from the
light-loaded VMs to the overloaded ones, where a weight based reallocation heuristic is used. As a result, the previous overloaded VM instances can afford much more workload proportional to the additional resources they obtain. Finally, when the workload decreases, all the resources are returned to the original owners. In this process, the weight based sharing heuristic is: according to the residual resource and the predicted start time of the peak load, the light-loaded VMs on the same server are ranked with different weights. The VM with smaller weights denotes it has more resources and the longer time to its peak load, and such VM is selected as the creditor with high priority. V.
PARALLEL VM MIGRATION
Once the overloaded servers have been detected, where one or more types of resources are insufficient to the VMs residing on the servers. The VM migrations have to be performed to reduce the workload. On the large-scale cloud platform, there may be multiple overloaded servers at the same time, and we propose a parallel migration method considering two key issues: 1) which VMs and targets are selected in the migration, and 2) how to migrate the VMs to reduce the total time. 1) Determine the VMs and the targets The following factors should be considered when deciding which VMs and target nodes are selected to migrate. a) The type of the insufficient resource When several VMs on the same server run with the heavy workload, one or more types of the resources will run out, and this results in the resource contention. By monitoring the system state, we can identify which resource is insufficient, represented as Ri. b) The size of the VMs to migrate Live migration works by copying the VM memory image to the destination, which significantly impacts the performance of the application inside the VM [13]. Reducing the amount of data transferred over the network can minimize the total migration time, and thus, the performance interferences on the applications. We represent the VM RAM size as memvm. c) The metric of selecting VM to migrate According to the Eq. (4) presented in Section III, the VM RAM size is a decisive factor of the migration cost, and the total RAM size should be minimal as long as the resource demands can be satisfied. We use the primary metric M, Ratio of Ri to memvm, to select the VMs. The bigger M means much more resources Ri can be released proportionately to the unit RAM. Within an overloaded server, we select the VMs in decreasing order of M and attempt to migrate the maximum volume (i.e. load) per unit byte, which has been proven can minimize migration overhead. However, when the insufficient resource is memory, above method is not applicable, and we deal with this case by selecting the VMs in descending order of their memvm. d) The VMs to migrate
We minimize the number of the VMs to migrate based on the VM selection metric (M) and the estimation of the the resource should be released (Smin). If there are multiple solutions, we select the ones with the maximum sum of their M (if memory is the insufficient resource, we select the one with the minimum sum of their memvm) as the set MIG. This means the maximum resource can be released by moving per unit VM. For example, within a server s, there are four VMs (a, b, c and d) contending for the network I/O, and at least two VMs should be migrated. The VM set can be {a, b} or {b, c}, and we select the first set since whose sum of M is bigger. e) The target server Given a VM, we get the servers can host it as the candidates based on the constraints presented in Section III. In order to reduce the performance interference, we only select the targets among the servers with the light workload. It is also assumed that at least one candidate node can be found definitely. Considering the above factors, we design a runtime VM migration algorithm to reduce the workload in a parallel way, which can reduce the total migration time and minimize the performance interferences. 2) Migrating the VMs in a parallel way As we all know, if there is only one VM to migrate, the most suitable destination can be selected according to many metrics, such as the available resources, network bandwidth. However, in a large-scale cloud computing environment, there may be multiple overloaded servers at the same time. The VMs may have the different targets due to their various resource demands and the constraints. The difficulty of reducing the total migration time lies in how to migrate multiple VMs simultaneously. To make the trade-off between shortening migration time and reducing the performance interference, we prefer to transfer the VMs from the different source nodes to the different target nodes simultaneously. Moreover, if there are at least two VMs to be moved in the same server, the VMs on this server should be moved one by one. The algorithm is as follows. 1. We suppose there are k servers are overloaded, S={S1, S2, …, Sk}. For each server Si ∈S, its MIGi has l elements, MIGi={VM1, VM2, …, VMl}, which means there are l VMs to migrate on server i. 2. We select one VM from each server in S as the set of VMs to migrate in the first round, without loss of generality, we present the set as V = {VM11, VM2i, …, VMlk} 3. For each VMij in V, we found its candidates target set Ti, Ti∉S. 4. We get the union set T of all the Ti of each VMij. 5. With S and T, we get the edges between the nodes in these two sets according to the VMs in V and their candidate targets in T. The weight of each edge W denotes the relative transfer rate decided by RNB and memvm. 6. Since each VM has to be moved to the different target nodes, the parallel VM migration is mapped to the max matching problem of the bipartite graph
[15], where the objective is to minimize the total migration time. 7. When the most suitable targets of all the VMs in V are identified, the individual migration time ti, the longest migration time tl and the total migration time tm can be estimated. Based on the migration plan of the first round, S and T are updated, and the elements in them may change. 8. If there are still VMs to migrate, the step from 2 to 7 is iterated until all the overloaded servers are dissipated. The difference between the first round and the following rounds is that ti, tl, and tm of the previous round should be considered when making the plan of the current round. We illustrate this algorithm with an example shown in Fig. 3, where the rectangles represent the servers and the ellipses are the VMs in MIG. The gray ellipses are the ones to be moved at this round. The numbers on each edge node are the relative transfer rate of each VM from the source server to the target one. Figure 3 (a) is the initial state, where V = {VM11, VM21, VM31}, and T11={S4, S6}, T21={S4, S5}, T31={S5, S6}. According to the relative transfer rates on each edge, we got the parallel migration plan of the first round. VM11 migrate to S6, VM21 to S4, and VM31 to S5. After the first round migration plan, S and T change, which is shown in Fig. 3 (b). It is noted that after this round S5 is no longer capable to host other additional VMs and S2 becomes the candidate target of VM32. That is, S5∈S and S2∈T. We also can see that the first round total migration time tm is 15.38. Since there are still some VMs (VM12 and VM32) to migrate, the second round migration is planed similar to the first round. In addition, since the previous migrations to S4 and S5 finish several seconds ahead of time, the second round of these two nodes can start in advance. The final placement after these two round migrations is shown in Fig. 3 (c).
(a) Initial state
(b) intermediate state
(c) Final state
Figure 3. VM Parallel migration example
VI.
THE ARCHITECTURE OVERVIEW
Based on the self-reconfiguration mechanism proposed in this paper, we implemented a framework, which is shown in Fig. 4. There are four primary components, and they are: Monitor, as a daemon program, runs on each PM and gets the system information, including resource usage, network load, and request arrival rate. All information is sampled periodically without affecting application performance significantly. Predictor uses data from Monitor to predict the trends of the workload and the resource demands based on the probability statistics.
Local Configurator gets the results from the monitor and the predictor, and decides how much resource can be reallocated to the local overloaded VMs. Runtime VM migration controller decides when, where and how to move the VMs contending for the resources. To reduce the runtime reconfiguration costs, it is responsible for reducing the number and the total time of the VM migrations.
Figure 4. Overview of the two-level self-reconfiguration Framework
VII. EXPERIMENTS AND EVALUATIONS The effectiveness of the local adjustment can be seen in the experiment in Section II, and here we focus on evaluating the VM parallel migration strategy. Also based on the experimental environment introduced in Section II, we conduct the evaluation experiment. Firstly, we place several VMs configured with 2.4GHz CPU resource and various RAM on five nodes, S1 to S5. Secondly, we make these nodes contend for different resources (two for network IO, two for CPU and one for memory) by increasing their workload. Then, another five nodes, T1 to T5, are used as targets to host VMs being migrated. According to the residual resource of these nodes, some VM instances on servers S1 to S5 are selected and moved to different target nodes. The experiment information is shown in TABLE IV. We employ the different methods to select and move VMs. These methods are: 1) Parallel VM Migration (PVM): the method introduced in Section V, considering the multiple factors and using bipartite graph matching based algorithms to migrate the VMs in parallel. 2) Random VM selection and migration (RSM): select VMs to migrate without concerning the factors we proposed, and move these instances in parallel. 3) Simultaneous VM migration (SVM): migrate all selected VMs simultaneously even there are VMs in the same source nodes. The migration results and the durations are shown in TABLE V. We find that both PVM and RSM perform parallel migrations in two rounds. Without considering the factors of selecting VMs, the VMs to be moved in RSM are not the most appropriate ones, which need much more network bandwidth to finish the migrations. As a result, RSM performs the longest time among these experiments. Although SVM has better performance than RSM, all instances are migrated simultaneously, and much of the
network I/O is taken up. We find that in the experiment with SVM, the other VM instances receive more interference in their performance due to the lack of the network I/O. TABLE IV. Source Node S1 S2 S3 S4 S5
Contention Network I/O Network I/O CPU CPU Memory TABLE V.
Method PVM RSM SVM
EXPERIMENT INFORMATION VM instances S11,S12,S13,S14 S21,S22,S23,S24 S31,S32,S33,S34 S41,S42,S43,S44 S51,S52,S53
Target Node T1, T2, T3 T1, T4 T4, T5 T2, T5 T4, T5
EXPERIMENT RESULTS
VM migration operation S14->T3, S24->T1, S32->T5, S43->T2, S53->T4 S12->T2, S21->T1, S34->T4, S41->T5 S11->T3, S22->T1, S33->T5, S42->T2, S51->T4 S13->T2, S23->T1, S31->T4, S44->T5 S12,13->T3, S24,23->T1, S34,31->T5, S43,41->T2, S52->T4
Time (s) 57.74 72.96 64.81
Figure 5 presents some VM migrations conducted in this experiment. There are a lot of VMs and PMs, and we select the source node S3 and the target node T4, T5 as the representatives. In Fig. 5, the y-axis is CPU utilization and the x-axis represents the time series.
Figure 5. A series of the VM migration
At first, VM S32 and S34 have a large load, causing the CPU utilization on S3 to exceed the threshold (0.8). Then the overloaded servers, including S3, are detected. The migration controller makes the parallel migration decision based on the method proposed in Section V. In the first round, S32 is selected and moved to T5, which spends 17.02 seconds. During the migration, the network and other resource scheduling cause the increasing of the CPU utilization on T5. Since S3 is still overloaded after the migration, S34 is selected and moved to T4 in the second round, spending 20.45 seconds. We also find that S41 and S53 migrate to T5 and T4 from the other overloaded servers respectively in these two rounds. After the migrations, all the overloaded servers are dissipated, and the total time (57.74 sec.) is the shortest one comparing with those of the other two methods.
VIII. RELATED WORK There is some work devoted to addressing the problems of VM deployment, placement and runtime reconfiguration with different optimization objectives. Initially, some work just considered the resource utilizations and the workload, using ordering algorithm [16], constraint programming [6] and genetic algorithm [17] to get solutions. Recently, some factors relating to providers’ costs (e.g. Power consumption, thermal dissipation and operation costs, etc.) were considered. Temperature-aware workload placements were presented in [9] and [10]. From the service providers’ viewpoint, work in [18] guided VM template selection and runtime reconfiguration according to resource price, reconfiguration cost and the fluctuating workload. The author took both resource price and operation cost into account and adapted the VMs in advance based on the performance predictions. The research work from PPs’ viewpoint is much more common. J. Xu [11] simultaneously considered workloads, power consumption and thermal management when performing runtime reconfiguration. D. Jayasinghe [19] found the infrastructure structure affected the performance, availability and communication, then he proposed a hierarchical approach to guide VM placement. Y. Kang [20] focused on the factors of user physical distribution and network transmission, and he mapped VM placement to the k-median and max k-cover problem to improve the end users’ experiences. Considering the questions of how to select the overloaded servers and the VMs, and which servers should be the migration targets, M.Andreolini [16] proposed a VM migration method. This method selected the source servers and the VMs based on CUSUM model and the workload trends. The heuristic in migration was moving VMs as little as possible while reducing the load as much as possible. In the method, the author also considered the performance interferences to the target node and made one target only receive one VM once time. Compared with this work, our advantage is that we consider the network bandwidth and the VM size during migration and propose a detailed parallel migration method trading off between the migrations and the performance interferences. Work in [5] proposed the black-box and gray-box strategies for VM migration. The authors implemented a system, named sandpiper, to resolve the overloaded servers in short time. Sandpiper monitors the system based on the black-box or gray-box strategies and detects the overloaded ones. According to this information, sandpiper determines a new mapping between the VMs and the servers and initiates the necessary migrations. They defined volume, the hotspot evaluation metric, and selected the VMs in decreasing order of their volume-to-size ratio (VSR). When there was no suitable target node, the authors also proposed the VM swap by exchanging VMs on the different servers to resolve the overload. On one hand, the hotspot metric is defined by transforming a multi-dimensional resource vector (CPU, memory, I/O) to a scalar, and the value cannot reflect the real
hotspot exactly. On the other hand, the swap may move more VMs and lead to significant performance interferences. F. Hermenier proposed a consolidation manager, Entropy, to perform the dynamic VM migration and consolidation based on constraint programming [6]. The author found the best solutions of mapping multiple VMs to the nodes, and then they selected one solution, which can be performed with the least migrations. During the actual migration, they consider two types of constraints, sequential constraint and cyclic constraints. Although the author also pointed out that the VM migration should be performed in parallel as much as possible, they did not propose a concrete solution. Moreover, constraint programming can get the exact optimal solution, but it is limited in dealing with large-scale problems due to the performance. IX.
CONCLUSION AND FUTURE WORKS
We address the problem of automating the VM runtime configuration. We analyze the different costs and their related factors and concentrate on how to minimize the runtime reconfiguration costs by proposing a two-level VM self-reconfiguration mechanism. In this mechanism, we combine the VM local adjustment and the runtime migration to minimize the VM migrations and reduce the total migration time. Based on the objective formulated in the Section III, we make the following contributions. Based on the current states and the predictions of the resource demands and the workload, a time division multiplexing based VM local adjustment strategy is proposed to reduce the number of the VM migration. We map the runtime VM migration to the max matching problem of the bipartite graph and realize a parallel migration method to make the trade-off between reducing the VM migration time and the application performance interferences. However, there are still some limitations in our work, including: 1) the prediction techniques may have deviations in some cases, which delays the runtime reconfiguration decisions; 2) the interferences between VMs on a same server are ignored in our work. Therefore, in future work, to improve the effectiveness of our VM deployment and reconfiguration framework, we will handle these limitations by employing the much more accurate prediction techniques and taking the interferences between VMs into account.
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15] [16]
[17]
ACKNOWLEDGMENT This work is supported by the National Grand Fundamental Research 973 Program of China under Grant No. 2009CB320704, the National High Technology Research and Development Program of China under Grant No. 2012AA011204, the National Natural Science Foundation of China under Grant No. 61173003, 61100065. REFERENCES [1]
[2]
M. Armbrust, et al. “Above the clouds: A Berkeley view of cloud computing,” Tech. Rep. UCB/EECS-2009-28, EECS Department, U.C. Berkeley, 2009. J. E. Smith, and R. Nair, “The Architecture of Virtual Machines,” Computer, 38(5):32-38, 2005.
[18]
[19]
[20]
[21]
J. Almeida, V. Almeida, D. Ardagna, C. Francalanci, and M. Trubian, “Resource Management in the Autonomic Service-Oriented Architecture,” In Proceedings of International Conference on Autonomic Computing, 2006. P. Padala, K. G. Shin, et al, “Adaptive Control of Virtualized Resources in Utility Computing Environments,” In EuroSys ’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, Lisbon, Portugal, 2007. T. Wood, et al, “Black-Box and Gray-Box Strategies for Virtual Machine Migration,” in Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation, 2007. F. Hermenier, X. Lorca, et al, “Entropy: a consolidation manager for clusters,” in Proceedings of the ACM/Usenix International Conference on Virtual Execution Environments (VEE), 2009 G Khanna, et al, “Application Performance Management in Virtualized Server Environments,” in Proceedings of the 10th IEEEIFIP Network Operations and Management Symposium (NOMS), 2006. J. Xu and J. Fortes, “Multi-objective Virtual Machine Placement in Virtualized Data Center Environments,” Green Computing and Communications (GreenCom), IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom) , 2010. L. Ramos and R. Bianchini, “C-Oracle: Predictive Thermal Management for Data Centers,” in Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA 14), 2008 Q. Tang, S. Gupta and G. Varsamopoulos, “Energy-Efficient, Thermal-Aware Task Scheduling for Homogeneous, High Performance Computing Data Centers: A Cyber-Physical Approach,” in Trans. On Parallel and Distributed Systems, Spec. Issue on Power-Aware Parallel and Distributed Systems, 2008. J. Xu and J. Fortes, “A Multi-objective Approach to Virtual Machine Management in Datacenters,” in Proceedings of 8th International Conference on Autonomic Computing, 2011. H. Jin, L. Deng, et al., “Dynamic Processor Resource Configuration in Virtualized Environments,” in Proceedings of 8th IEEE International Conference on Service Computing, 2011. X. Pu and L. Liu, et al., “Understanding Performance Interference of IO Workload in Virtualized Cloud Environments,” in Proceedings of 3th International Conference on Cloud Computing, 2010. . Wood, L. Cherkasova, et al., “Predicting Application Resource Requirements in Virtual Environments,” in Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference, 2008 D. West, Introduction to Graph Theory. Prentice Hall, 2007. M. Andreolini, S. Casolari, et al, “Dynamic Load Management of Virtual Machines in a Cloud Architectures,” First International Conference on Cloud Computing (ICST CLOUDCOMP2009), 2009. P. Campegiani, “A Genetic Algorithm to Solve the Virtual Machines Resources Allocation Problem in Multi-tier Distributed Systems,” Second International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT’09), 2009. H. Wu, WB. Zhang, J. Wei, et al, “A Benefit-Aware On-Demand Provisioning Approach for Cloud Computing,” in Proceedings of the 3rd Asia-Pacific Symposium on Internetware, 2011. D. Jayasinghe, et al, “Improving Performance and Availability of Services Hosted on IaaS Clouds with Structural Constraint-aware Virtual Machine Placement,” in Proceedings of 8th International Conference on Service Computing, 2011. Y. Kang, et al, “A User Experience-based Cloud Service Redeployment Mechanism,” in Proceedings of 4th International Conference on Cloud Computing, 2011. P. Barham, B. Dragovic, et al., “Xen and the Art of Virtualization,” in Proceedings of the 19th ACM symposium on Operating systems principles (SOSP), 2003