Resource pool management: Reactive versus proactive or let's be ...

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Author's personal copy Computer Networks 53 (2009) 2905–2922

Contents lists available at ScienceDirect

Computer Networks journal homepage: www.elsevier.com/locate/comnet

Resource pool management: Reactive versus proactive or let’s be friends Daniel Gmach a,*, Jerry Rolia a, Ludmila Cherkasova a, Alfons Kemper b a b

Hewlett-Packard Laboratories, Palo Alto, CA, United States Technische Universität München, 85748 Garching, München, Germany

a r t i c l e

i n f o

Article history: Available online 20 August 2009 Keywords: Virtualized data centres Resource pool management Enterprise workload analysis Simulation

a b s t r a c t The consolidation of multiple workloads and servers enables the efficient use of server and power resources in shared resource pools. We employ a trace-based workload placement controller that uses historical information to periodically and proactively reassign workloads to servers subject to their quality of service objectives. A reactive migration controller is introduced that detects server overload and underload conditions. It initiates the migration of workloads when the demand for resources exceeds supply. Furthermore, it dynamically adds and removes servers to maintain a balance of supply and demand for capacity while minimizing power usage. A host load simulation environment is used to evaluate several different management policies for the controllers in a time effective manner. A case study involving three months of data for 138 SAP applications compares three integrated controller approaches with the use of each controller separately. The study considers trade-offs between: (i) required capacity and power usage, (ii) resource access quality of service for CPU and memory resources, and (iii) the number of migrations. Our study sheds light on the question of whether a reactive controller or proactive workload placement controller alone is adequate for resource pool management. The results show that the most tightly integrated controller approach offers the best results in terms of capacity and quality but requires more migrations per hour than the other strategies. Ó 2009 Elsevier B.V. All rights reserved.

1. Introduction Virtualization is gaining popularity in enterprise environments as a software-based solution for building shared hardware infrastructures. Forrester Research estimates that businesses generally only end up using between 8% and 20% of the server capacity they have purchased. Virtualization technology helps to achieve greater system utilization while lowering total cost of ownership and responding more effectively to changing business conditions. For large enterprises, virtualization offers a solution for server and application consolidation in shared resource pools. The consolidation of multiple servers and their

* Corresponding author. Tel.: +1 650 857 4955. E-mail addresses: [email protected] (D. Gmach), jerry.rolia@hp. com (J. Rolia), [email protected] (L. Cherkasova), alfons.kemper@ in.tum.de (A. Kemper). 1389-1286/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2009.08.011

workloads has an objective of minimizing the number of resources, e.g., computer servers, needed to support the workloads. In addition to reducing costs, this can also lead to lower peak and average power requirements. Lowering peak power usage may be important in some data centres if peak power cannot easily be increased. Applications participating in consolidation scenarios can make complex demands on servers. For example, many enterprise applications operate continuously, have unique time varying demands, and have performance-oriented Quality of Service (QoS) objectives. To evaluate which workloads can be consolidated to which servers, some preliminary performance and workload analysis should be done. In the simple naive case, a data centre operator may estimate the peak resource requirements of each workload and then evaluate the combined resource requirements of a group of workloads by using the sum of their peak demands. However, such an approach can

Author's personal copy 2906

D. Gmach et al. / Computer Networks 53 (2009) 2905–2922

lead to significant resource over-provisioning since it does not take into account the benefits of resource sharing for complementary workload patterns. To evaluate which workloads can be consolidated to which servers, in this work we employ a trace-based approach [1] that assesses permutations and combinations of workloads in order to determine a near optimal workload placement that provides specific qualities of service. The general idea behind trace-based methods is that historic traces offer a model of application demands that are representative of future application behaviour. Traces are used to decide how to consolidate workloads to servers. In our past work, we assumed that the placement of workloads would be adjusted infrequently, e.g., weekly or monthly [1]. However, by repeatedly applying the method at shorter timescales we can achieve further reductions in required capacity. In this scenario, we treat the trace-based approach as a workload placement controller that periodically causes workloads to migrate among servers to consolidate them while satisfying quality requirements. Such migrations [2] are possible without interrupting the execution of the corresponding applications. We enhance our optimization algorithm to better support this scenario by minimizing migrations during successive control intervals. Though enterprise application workloads often have time varying loads that behave according to patterns [3,4], actual demands are statistical in nature and are likely to differ from predictions. Therefore, to further improve the efficiency and application quality of service of our approach, we manage workloads by integrating the workload placement controller with a reactive workload migration controller that observes current behaviour to: (i) migrate workloads off of overloaded servers, and (ii) free and shut down lightly-loaded servers. There are many management policies that can be used to guide workload management. Each has its own parameters. However, predicting and comparing the long term impact of different management policies for realistic workloads is a challenging task. Typically, this process is very time consuming as it is mainly done following a risky ‘‘trial and error” process either with live workloads and real hardware or with synthetic workloads and real hardware. Furthermore, the effectiveness of policies may interact with the architecture for the resource pool of servers so it must be repeated for different alternatives. To better assess the long term impact of management policies we exploit a host load simulation environment. The environment: models the placement of workloads on servers; simulates the competition for resources on servers; causes the controllers to execute according to a management policy; and dynamically adjusts the placement of workloads on servers. During this simulation process, the simulator collects metrics that are used to compare the effectiveness of the policies. A case study involving three months of data for 138 SAP applications is used to evaluate the effectiveness of several management policies. These include the use of the workload placement and workload migration controllers separately and in an integrated manner. The study considers trade-offs between: (i) required capacity and power usage, (ii) resource access quality of service for CPU and memory

resources, and (iii) the number of migrations. This paper significantly enhances our recent work [5] by considering more advanced policies, by introducing new quality of service metrics, and by gaining new insights using an analysis of variance (ANOVA) upon data from more than 10 times the previous number of simulation runs. The results of the ANOVA show that for our policies and case study data thresholds that define underload conditions have a greater impact on capacity and quality than those that define overload conditions. Finally, we found that the tight integration of controllers outperforms the use of the controllers in parallel or separately but in general causes more migrations per hour. The remainder of this paper is organized as follows. Section 2 describes the workload placement and migration controllers, management policies, and metrics. The host load simulation environment is introduced in Section 3. Section 4 presents case study results. Section 5 describes related work. Finally, conclusions are offered in Section 6.

2. Management services, policies, and quality metrics Our management policies rely on two controllers. This section describes the workload placement controller and the reactive workload migration controller. The management policies exploit these controllers in several different ways. Quality metrics are used to assess the effectiveness of management policies.

2.1. Workload placement controller The workload placement controller has two components. A simulator component simulates the assignment of several application workloads on a single server. It traverses the per-workload time varying traces of historical demand to determine the peak of the aggregate demand for the combined workloads. If for each capacity attribute, e.g., CPU and memory, the peak demand is less than the capacity of the attribute for the server then the workloads fit on the server. An optimizing search component examines many alternative placements of workloads on servers and reports the best solution found. The optimizing search is based on a genetic algorithm [6]. The workload placement controller is based on the Capman tool that is described further in [1,5]. It supports both consolidation and load levelling exercises. Load levelling balances workloads across a set of resources to reduce the likelihood of service level violations. Capman supports the controlled overbooking of capacity that computes a required capacity for workloads on a server that may be less than the peak of aggregate demand. It is capable of supporting a different quality of service for each workload [7]. Without loss of generality, this paper considers the highest quality of service, which corresponds to a required capacity for workloads on a server that is the peak of their aggregate demand.

Author's personal copy D. Gmach et al. / Computer Networks 53 (2009) 2905–2922

For this paper, we exploit Capman’s multi-objective optimization functionality. Instead of simply finding the smallest number of servers needed to support a set of workloads, Capman evaluates solutions according to a second simultaneous objective. The second objective aims to minimize the number of changes to workload placement. When invoking Capman, an additional parameter specifies a target t as a bound for the number of workloads that it is desirable to migrate. Limiting the number of migrations limits the migration overhead and reduces the risk of incurring a migration failure. If it is possible to find a solution with fewer than t migrations, then Capman reports the workload placement that needs the smallest number of servers and has t or fewer migrations. If more changes are needed to find a solution, then Capman reports a solution that has the smallest number of changes to find a feasible solution. A data centre operator could choose a value t based on experience regarding the overhead that migrations place upon network infrastructure and servers. The case study explores the sensitivity of capacity and quality metrics to the parameter t. 2.2. Workload migration controller The migration controller is a fuzzy-logic based feedback control loop. An advisor module of the controller continuously monitors the servers’ resource utilization and triggers a fuzzy-logic based controller whenever resource utilization values are too low or too high. When the advisor detects a lightly utilized, i.e., underload situation, or overload situation the fuzzy controller module identifies appropriate actions to remedy the situation. For this purpose, it is initialized with information on the current load situation of all affected servers and workloads and determines an appropriate action. For example, as a first step, if a server is overloaded it determines a workload on the server that should be migrated and as a second step it searches for a new server to receive the workload. Furthermore, these rules initiate the shutdown and startup of nodes. The architecture of the workload migration controller is illustrated in Fig. 1. The implementation of the workload migration controller uses the following rules: A server is defined as overloaded if its CPU or memory consumption exceed a given threshold. In an overload

Server Pool Central Pool Sensor

situation, first, a fuzzy controller determines a workload to migrate away and then it chooses an appropriate target server. The target server is the least loaded server that has sufficient resources to host the workload. If such a server does not exist, we start up a new server and migrate the workload to the new one. An underload situation occurs whenever the CPU and memory usage averaged over all servers in the server pool drops below a specified threshold. While an overload condition is naturally defined with respect to a particular server, the underload situation is different. It is defined with respect to the average utilization of the overall system involving all the nodes. In this way, we try to avoid system thrashing: e.g., a new server generally starts with a small load and should not be considered immediately for consolidation. In an underload situation, first, the fuzzy controller chooses the least loaded server and tries to shut it down. For every workload on this server, the fuzzy controller determines a target server. If a target cannot be found for a workload then the shutdown process is stopped. In contrast to overload situations, this controller does not ignite additional servers. Section 4.4 evaluates the impact of various combinations of threshold values for overload and underload management. A more complete description of the fuzzy controller and its rules are presented in [8,3]. 2.3. Policies The policies we consider evaluate whether a reactive controller or workload placement controller alone is adequate for resource pool management and whether the integration of controllers provides compelling benefits. Our study considers the following management policies: MC: migration controller alone; WP: workload placement controller operating periodically alone; MC + WP: workload placement controller operating periodically with the migration controller operating in parallel; MC + WP on Demand: migration controller is enhanced to invoke the workload placement controller on demand

Workload Migration Controller Measured Data

Advisor

Rules Defining Light Utilization and Overload Thresholds

Needs Action? Yes Central Pool Actuator

Action

2907

Fuzzy Controller

Fuzzy Logic Rule Set

Fig. 1. Architecture of the workload migration controller.



to consolidate workloads whenever the servers being used are lightly utilized; and, MC + WP + WP on Demand: workload placement controller operating periodically and the migration controller is enhanced to invoke the workload placement controller on demand to consolidate workloads whenever the servers being used are lightly utilized. The MC policy corresponds to using the workload migration controller alone for on-going management. The workload placement controller causes an initial workload placement that consolidates workloads to a small number of servers. The workload migration controller is then used to migrate workloads to alleviate overload and underload situations as they occur. The workload migration controller operates at the time scale that measurement data is made available. In this paper, the migration controller is invoked every 5 min. With the WP policy, the workload placement controller uses historical workload demand trace information from the previous week that corresponds to the next control interval, e.g., the next 4 h. In this way it periodically recomputes a globally efficient workload placement. The historical mode is most likely appropriate for enterprise workloads that have repetitive patterns for workload demands. The MC + WP policy implements the MC and WP policies in parallel, i.e., the workload placement controller is executed for each control interval, e.g., every 4 h, to compute a more effective workload placement for the next control interval. Within such an interval, the migration controller, independently, migrates workloads to alleviate overload and underload situations as they occur. The MC + WP on Demand policy integrates the placement and migration controllers in a special way. Instead of running the workload placement controller after each workload placement control interval, the migration controller uses the workload placement algorithm to consolidate the workloads whenever servers being used are lightly utilized. Finally, the MC + WP + WP on Demand policy is the same as MC + WP on Demand policy but also invokes the workload placement controller after every control interval, e.g., every 4 h, to periodically provide a globally efficient workload placement.

2.4. Efficiency and quality metrics To compare the long term impact of management policies we consider several metrics. These include: total server CPU hours used and total server CPU hours idle; normalized server CPU hours used and normalized server idle CPU hours; minimum and maximum number of servers; the distribution of power usage in Watts; CPU and memory resource access quality per hour; and the number of migrations per hours.

The total server CPU hours used corresponds to the sum of the per workload demands. Total server CPU hours idle is the sum of idle CPU hours for servers that have workloads assigned to them. The server CPU hours idle shows how much CPU capacity is not used on the active servers. Normalized values are defined with respect to the total demand of the workloads as specified in the workload demand traces. Note that if normalized server CPU hours used is equal to 1 and normalized server CPU hours idle are equal to 1.5 then this corresponds to an average CPU utilization of 40%. The minimum and maximum numbers of servers for a policy are used to compare the overall impact of a management policy on capacity needed for server infrastructure. This determines the cost of the infrastructure. Each server has a minimum power usage pidle , in Watts, that corresponds to the server having idle CPUs, and a maximum power usage pbusy that corresponds to 100% CPU utilization. The power used by a server is estimated as

pidle þ u ðpbusy pidle Þ; where u is the CPU utilization of the server [9]. We define and introduce a new quality metric named violation penalty that is based on the number of successive intervals where a workload’s demands are not fully satisfied and the expected impact on the customer. Longer epochs of unsatisfied demand incur greater penalty values, as they are more likely to be perceived by those using applications. For example, if service performance is degraded for up to 5 min customers would start to notice. If the service is degraded for more than 5 min then customers may start to call the service provider and complain. Furthermore, larger degradations in service must cause greater penalties. The quality of the delivered service depends on how much the service is degraded. If demands greatly exceed allocated resources then the utility of the service suffers more than if demands are almost satisfied. Thus, for each violation a penalty weight wpen is defined that is based on the expected impact of the degraded quality on the customer. The violation penalty value pen for a violation with I successive overloaded measurement intervals is defined as pen ¼ I2 maxIi¼1 ðwpen;i Þ, where wpen;i is the penalty in the ith interval. Thus longer violations tend to have greater penalties than shorter violations.1 The weight functions used for CPU and memory are given below. The sum of penalty values over all workloads over all violations defines the violation penalty for a metric. Regarding CPU allocations, we estimate the impact of degraded service on a customer using a heuristic that compares the actual and desired utilization of allocation for the customer. An estimate is needed because we do not have measurements that reflect the actual impact on a customer. Let ua and ud < 1 be the actual and desired CPU utilization of allocation for an interval. If ua 6 ud then we CPU define the weight for the CPU penalty wCPU pen as wpen ¼ 0 1 We note that such penalties may be translated to monetary penalties in financially driven systems and that monetary penalties are likely to be bounded in such systems.


since there is no violation. If ua > ud then response times will be higher than planned so we must estimate the impact of the degradation on the customer. We define:

wCPU pen ¼ 1

1 uka : 1 ukd

The penalty has a value between 0 and 1 and is larger for bigger differences and higher utilizations. The superscript k denotes the number of CPUs on the server. This formula is motivated by a formula that estimates the mean response time for the M=M=k queue [10], namely r ¼ 1= ð1 uk Þ estimates the mean response time for a queue with k processors and unit service demand [11]. The power term k reflects the fact that a server with more processors can sustain higher utilizations without impacting customer response times. Similarly, a customer that has a higher than desired utilization of allocation will be less impacted on a system with more processors than one with fewer processors. Regarding memory allocations, we estimate the impact of degraded service on a customer using a heuristic that compares the actual allocation of memory la and desired allocation of memory ld for a customer. If la P ld then we define a memory penalty weight wMem pen ¼ 0 since there is no violation. If la < ld , we define wMem pen ¼ 1 hr, where hr is the memory hit ratio. In our simulation, the hit ratio is measured as the percentage of satisfied memory demands in bytes. To summarize, the CPU and memory violation penalties reflect two factors. They reflect the severity and length of the violation. The severity of the violation is captured by a weight function. Two weight functions were introduced, but others could be employed as well. Finally, the number of migrations is the sum of migrations caused by the workload placement and workload migration controllers. A smaller number of migrations is preferable as it offers lower migration overheads and a lower risk of migration failures. We divide the total number of migrations for an experiment by the number of simulated hours to facilitate interpretation.

3. Host load simulator Predicting the long term impact of integrated management policies for realistic workloads is a challenging task. We employ a flexible host load simulation environment to evaluate many management policies for resource pools in a time effective manner. The architecture of the host load simulation environment is illustrated in Fig. 2. The simulator takes as input historical workload demand traces, an initial workload placement, server resource capacity descriptions, and a management policy. The server descriptions include numbers of processors, processor speeds, real memory size, and network bandwidth. A routing table directs each workload’s historical time varying resource requirement data to the appropriate simulated server. Each simulated server uses a fair-share scheduling strategy to determine how much of the workload demand is and is not satisfied. The central pool sensor makes time varying information about satisfied demands available to management controllers via an open interface. The interface also is used to integrate different controllers with the simulator without recompiling its code. The controllers periodically gather accumulated metrics and make decisions about whether to cause workloads to migrate from one server to another. Migration is initiated by a call from a controller to the central pool actuator. In our simulation environment this causes a change to the routing table that reflects the impact of the migration in the next simulated time interval. During the simulation process the metrics defined in Section 2.4 are gathered. Different controller policies cause different behaviours that we observe through these metrics. 4. Case study This section evaluates the effectiveness of the proposed management policies using three months of real-world workload demand traces for 138 SAP enterprise applications. The traces are obtained from a data centre that specializes in hosting enterprise applications such as customer relationship management applications for small

Host Load Emulator Workload Allocation Table

Router

Historic Workload Traces Server Pool Simulated Server 1

Simulated Server 2

...

Central Pool Sensor Resource Demand Data Flow Feedback

Controllers

2909

Workload Placement Controller

Simulated Server n Central Pool Actuator

Workload Migration Controller

Fig. 2. Architecture of the host load simulator.


CPU Demand as % of Peak CPU Demand

2910 100%

99th Percentile 97th Percentile 95th Percentile Mean Value

90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0

20

40

60

80

100

120

140

Workload Number

Fig. 3. Top percentile of CPU demand for applications under study.

and medium sized businesses. Each workload was hosted on its own server so we use resource demand measurements for a server to characterize the workload’s demand trace. The measurements were originally recorded using vmstat [12]. Traces capture average CPU and memory usage as recorded every 5 min. As many of the workloads are interactive enterprise workloads, a maximum utilization of 0.66 is desired to ensure interactive responsiveness. Hence, CPU demands in the historical workload traces are scaled with a factor of 1.5 to achieve a target utilization of 0.66. The resource pool simulator operates on this data walking forward in successive 5 min intervals. In addition to the three months of the real-world demand traces we used data from the previous month to initialize the demand buffers of the central pool sensor. This enables the integrated management services to access prior demand values at the start of a simulation run. We consider the following resource pool configuration:2 each server consists of 8 2.93-GHz processor cores, 128 GB of memory, and two dual 10 Gb/s Ethernet network interface cards for network traffic and virtualization management traffic, respectively. Each server consumes 695 W when idle and 1013 W when it is fully utilized. Section 4.1 gives a workload characterization for the SAP workloads considered in the study. Section 4.2 begins our study at workload placement by considering the impact of migration overhead. Section 4.3 considers the question: how much capacity and power can be saved by periodically consolidating workloads? The section assumes perfect knowledge about future workload demands and its results give a baseline for capacity savings that is used for the rest of the case study. Sections 4.4 and 4.5 do not assume perfect knowledge of future demands. They consider tuning of migration controller parameters and compare the capacity savings offered by the migration controller with the workload placement controller and three scenarios where the controllers are integrated. Finally, Section 4.6 presents results that demonstrate that the number of migrations caused by the workload placement controller 2 Service providers can use the proposed approach for evaluating different hardware platforms. For example, in [5] we made recommendations regarding server and blade based resource pool configurations.

can be significantly reduced without a major impact on CPU capacity or violation penalty.

4.1. Workload characteristics Use of virtualization technology enables the creation of shared server pools where multiple application workloads share each server in the pool. Understanding the nature of enterprise workloads is crucial to properly design and provision current and future services in such pools. Existing studies of internet and media workloads [13,14] indicate that client demands are highly variable (‘‘peak-to-mean” ratios may be an order of magnitude or more), and that it is not economical to overprovision the system using ‘‘peak” demands. Do enterprise workloads exhibit similar properties? We present results that illustrate the peak-to-mean behaviour for 138 enterprise application workloads. Understanding of burstiness for enterprise workloads can help to choosing the right trade-off between the application quality of service and resource pool capacity requirements. This section analyses burstiness and access patterns of the enterprise application workloads under study. It shows percentiles of demands, the maximum durations for contiguous demands beyond the 99th percentile, and a representative demand trace for an interactive enterprise application. Fig. 3 gives the percentiles of CPU demand for the 138 applications over the period of four months. The illustrated demands are normalized as a percentage with respect to their peak values. Several curves are shown that illustrate the 99th, 97th, and 95th percentile of demand as well as the mean demand. The workloads are ordered by the 99th percentile for clarity. The figure shows that more than half of all studied workloads have a small percentage of points that are very large with respect to their remaining demands. The left-most 60 workloads have their top 3% of demand values between 10 and 2 times higher than the remaining demands in the trace. Furthermore, more than half of the workloads observe a mean demand less than 30% of the peak demand. These curves show the bursty nature of demands for most of the enterprise applications under study. Consolidating such bursty workloads onto a smaller number of more powerful servers is likely



Memory Demand as % of Peak Memory Demand

to reduce the CPU capacity needed to support the workloads. The corresponding percentiles for the memory demands of the 138 applications are shown in Fig. 4. Again, the illustrated demands are normalized as percentage with respect to the peak memory demand. The curves show that the average memory demand of an application is closer to its the peak demand than it is observed for CPU. Forty-five percent of the workloads exhibit a mean demand above 80% of their peak demands. Thus, in a memory bound infrastructure the potential resource savings from resource sharing is expected to be smaller than in CPU bound systems. An additional and complementary property for a workload is the maximum duration of its contiguous application demands. While short bursts in demand may not significantly impact a workload’s users, a system must be provisioned to handle sustained bursts of high demand. However, if an application’s contiguous demands above the 99th percentile of demand are never longer than 10 min then it may be economical to support the application’s 99th percentile of demand and allow the remaining bursts to be served with degraded performance [7]. We have analysed the maximum duration of bursts of CPU and memory demands for the workloads. Fig. 5 shows the duration of each workload’s longest burst in CPU

demand that is greater than its corresponding 99th percentile of demand. From the figure, we see that: 83.3% of workloads have sustained bursts in CPU demand that last more than 15 min; and, 60% of workloads have sustained bursts in CPU demand that last more than 30 min. These are significant bursts that could impact an end user’s perception of performance. A similar analysis for memory demands shows that: 97.8% of workloads have sustained bursts in memory demand that last more than 15 min; and, 93.5% of workloads have sustained bursts in memory demand that last more than 30 min. The numbers show that the length of the bursts matters. This justifies our use of the quality metric that takes the number of successive intervals where a workload’s demands are not satisfied into account. The analysis also shows that the CPU demands are much more variable than memory demands. CPU demands fluctuate with user load. Memory demands tend to increase then periodically decrease due to some form of

100%

80%

60%

40%

99th Percentile 97th Percentile 95th Percentile Mean Value

20%

0% 0

20

40

60 80 Workload Number

100

120

140

Fig. 4. Top percentile of memory demand for applications under study.

1000

97th Percentile

Longest Busy Period in Minutes

900 800 700 600 500 400 300 200 100 0 0

20

40

60

80

100

120

Workload Number

Fig. 5. Longest busy periods above 99th percentile of demand for studied applications.

140



memory garbage collection. For the applications in our case study, garbage collection appeared to occur each weekend. Fig. 6 illustrates the behaviour of a typical workload. Finally, a workload pattern analysis (following the methodology introduced in [4]) is conducted. Fig. 7 gives a summary of the pattern lengths for the 138 workloads. The pattern analysis discovered patterns with lengths between 3 h and seven weeks:

CPU violation penalty per hour. We do not focus on memory violation penalties, as these values were typically small. Many virtualization platforms incur virtualization overhead. Virtualization overhead depends on the type of the virtualization and its implementation specifics. A migration requires the memory of a virtual machine to be copied from the source server to a target server. Typically, the ‘‘amount” of CPU overhead is directly proportional to the ‘‘amount” of I/O processing [15,16]. Supporting a migration causes CPU load on both the source and target servers. The simulator reflects this migration overhead in the following way. For each workload that migrates, a CPU overhead is added to the source and destination servers. The overhead is proportional to the estimated transfer time based on the memory size of the virtual machine and the network interface card bandwidth. It is added to the source and destination servers over a number of intervals that corresponds to the transfer time. We assume that we use no more than half of the bandwidth available for management purposes, i.e., one of the two management network interface cards. For example, if a workload has 12 GB memory size and the networking interface is 1 Gb/s then additional CPU time is used for migrating the workload is ðC migr 12 GBÞ=1 Gb=s, where C migr is the coefficient of migration overhead.

67.6% of the workloads exhibit a weekly behaviour; and, 15.8% of the workloads exhibit a daily behaviour. To summarize, this section has shown that there are significant bursts in demand for the workloads and there is a greater opportunity for CPU sharing than for memory sharing. A workload pattern analysis shows that most of the enterprise workloads exhibit strong weekly or daily patterns for CPU usage. Memory usage tends to increase over a week then decrease suddenly. 4.2. Impact of migration overhead with a workload placement controller This section considers the impact of CPU overhead caused by migrations on required CPU capacity and on

2500

CPU Demand Memory Demand

40 1500 30 1000 20 500

10

0

0 15/04

29/04

13/05

27/05

10/06

24/06

Time

Fig. 6. CPU and memory demands for a user interactive workload.

Length of Detected Patterns

7 Weeks 6 Weeks 5 Weeks 4 Weeks 3 Weeks 2 Weeks 10 Days 1 Week 5 Days 1 Day 0

20

40

60 80 Workload Number

Fig. 7. Lengths of workload demand patterns.

100

120

140

Memory Demand In GB

50

2000 CPU Demand in Shares

60



To evaluate an impact of the additional CPU overhead caused by I/O processing during the workload migrations, we employ the workload placement controller with a 4 h control interval. All workloads migrate at the end of each control interval. Fig. 8 shows the results using migration overhead coefficient C migr varied from 0 to 2. The figure shows several different metrics. These include the normalized CPU hours used, the normalized idle CPU hours, and the CPU violation penalty per hour. A higher migration overhead requires more CPU resources. The impact on CPU hours used is only noticeable in Fig. 8 when C migr P 1. The CPU violation penalty clearly increases for C migr P 1. In general, we find our results to be insensitive to values of C migr in the range between 0 to 1.0. We choose C migr ¼ 0:5 used during a workload migration for the remainder of the study. This value is not unreasonable because the network interface cards we consider support TCP/IP protocol offloading capabilities. There are many reports suggesting that such cards can be driven to 10 Gbps bidirectional bandwidth while using 50% or less of a CPU, e.g., [17].

4.3. Performance, quality, and power assuming perfect knowledge In this section, we consider an ideal workload placement strategy. This approach assumes that we have perfect knowledge of future resource demands. It gives an upper bound for the potential capacity savings from consolidating workloads at different time scales. We use this bound later in the paper to determine how well our policies, that do not have perfect knowledge, perform compared to the ideal case. Fig. 9 shows the results of an simulation where we use the workload placement controller to periodically consolidate the 138 workloads to a small number of servers in the resource pool. For this scenario, for a given time period, the workload placement controller chooses a placement such that each server is able to satisfy the peak of its workload CPU and memory demands. The figure shows the impact on capacity requirements of using the workload placement controller once at the start of the three months, and for cases with a control interval of 4 weeks, 1 week, 1 day,

1.4

4 Normalized CPU Hours Used Normalized CPU Hours Idle CPU Violation Penalty

3.5 3

1

2.5 0.8 2 0.6 1.5 0.4

1

0.2

Violation Penalty per Hour

Normalized CPU Hours

1.2

0.5

0

0 0

0.25

0.5

0.75

1

1.5

2

Coefficient of Migration Overhead Fig. 8. Migration overhead.

1880.6 1297.6

CPU Hours Used + Idle CPU Violation Penalties Migrations

Normalized CPU Hours

2

500 400

1.5 300 1 200 0.5

100 0

0 Initial Rear4 Weeks rangement Only

1 Day 1 Week

1 Hour 4 Hours

5 Minutes 15 Minutes

Workload Placement Control Interval Fig. 9. Simulation results assuming perfect knowledge.

Violation Penalties / Migrations Per Hour

600

2.5



4 h, 1 h, and 15 min. The figure shows that re-allocating workloads every 4 h captures most of the capacity savings that can be achieved, i.e., with respect to reallocation every 15 min. The 4 h and 15 min scenarios required a peak of 19 servers. All the other scenarios also had peaks between 19 and 21 servers. For the 4 h scenario, we note that the normalized server CPU hours used is approximately one-half of the idle CPU hours giving an average utilization close to 69% over the three month period with a negligible hourly CPU violation penalty value of 0.4. In subsequent subsections, we treat the results from the 4 h ideal case as the baseline for capacity and quality. Fig. 9 shows that as expected as the control interval drops to the hourly, fifteen minute, and five minute levels the number of migrations per hour increases proportionally as most workloads are likely to be reassigned. The resulting migration overheads increase the CPU quality violations. Table 1 gives a more detailed breakdown of the violations for the 4 h control interval case. The distribution of the Watts used is shown in Fig. 10. We note that the power consumption of the 15 min, 1 h, and 4 h scenarios are pretty close to each other. For workload placement control intervals longer than 1 day, more servers are used resulting in higher power consumption. In later subsections, we consider how much of these ideal advantages we are able to achieve in practice without assuming perfect knowledge. The workload placement controller control interval is chosen as 4 h.

4.4. Workload migration controller thresholds This section evaluates the effectiveness of the migration controller. The experiments start with an ideal workload placement for the first 4 h and use the fuzzy-logic based migration controller to maintain the resource access quality of the workloads. The advisor module of the controller

Table 1 CPU quality violations assuming perfect knowledge for the 4 h control interval. Interval duration

Total number

Average number

5 min 10 min 15 min 20 min

1090 161 14 3

13 per Day 1.9 per Day 1.2 per Week 1 per Month

is configured as follows: it triggers the fuzzy controller if either a server is overloaded or the system is lightly utilized. A server is considered overloaded if the CPU or memory utilization exceeds a given threshold. In that case, it triggers the fuzzy controller that tries to migrate one workload from the concerned server to a less loaded one. Furthermore, the advisor deems a server pool lightly utilized, if the average CPU and memory utilization over all servers fall below their given thresholds. Then, the fuzzy controller chooses the least loaded server, migrates all of its workloads to other servers, and shuts down the server. To evaluate the impact of the feedback controller, the following levels for the thresholds are considered: a: The CPU threshold defining overloaded servers varies from 80%, 85%, 90%, 95%, and 99% CPU utilization. b: The memory threshold defining overloaded servers varies from 80%, 85%, 90%, 95%, and 99% memory utilization. d: The CPU threshold defining a lightly utilized resource pool varies from 30%, 40%, 50%, and 60% average CPU utilization of the server pool. c: The memory threshold defining a lightly utilized resource pool varies from 30%, 40%, 50%, 60%, 70%, and 80% average memory utilization of the server pool. A three months simulation is conducted for each of the factor level combinations resulting in a total number of 600 experiments. An ANOVA model [18] captures the effects of factor levels such as different values for thresholds on a metric, e.g., on the CPU violation penalty or CPU capacity metric. Each factor level has a numerical effect on the metric. The sum of each factor’s effects adds to zero. The effect is defined as the difference between the overall mean value for the metric over all combinations of factor levels and the numerical impact of the factor level on the metric with respect to the overall mean value. Similarly, interactions between factor levels also have effects. An analysis of variance considers the sum of squares of effects. The sums of squares are variations for the metric. The analysis quantifies the impact of factors and interactions between factors on the total variation over all combinations of factor levels. When the assumptions of the ANOVA modelling approach hold, a statistical F-test can be used to determine which factors and interactions between factors have

1 5 Minutes 15 Minutes 1 Hour 4 Hours 1 Day 1 Week 4 Weeks Initial Rearrangement Only

CDF

0.8 0.6 0.4 0.2 0

0

2000

4000

6000

8000 10000 12000 Watt Consumption per Interval

14000

16000

Fig. 10. Power consumption assuming perfect knowledge.

18000

20000



uted with significance a ¼ 0:2. This suggests that 20% of randomly generated Normally distributed data sets will have a greater difference from the Normal distribution than our experiment’s error data. Hence, we conclude that the error values are Normally distributed and the ANOVA model can be applied. For the CPU capacity, the K–S test yields a D-value D ¼ 0:049122197 suggesting that the 600 errors are Normally distributed with a significance level a ¼ 0:1. As with the CPU violation penalty model, removing a few extreme values dramatically increases the significance level. ANOVA results are presented in a table, e.g., Table 2. The columns identify the factor and factor interactions (Source), each source’s sum of squares of effects (SS) as an absolute value and as a percentage (SS in %) of the total SS over all sources, the number of degrees of statistical freedom that contribute to the SS, the mean square value (MS) which is the SS for a source divided by its number of degrees of freedom, the computed F-value for the mean square value, the critical value for the F-value that determines statistical significance, and finally the conclusion regarding whether a source has a statistically significant impact on the metric. Table 2 gives the results of the ANOVA for the factors regarding CPU violation penalty. The table shows that factors d and c, the CPU and memory thresholds for defining underloaded servers, and their interaction explains 99% of the variation in CPU violation penalty over the 600 experiments. Interestingly, factors a and b, thresholds for overloaded CPU and memory, had little impact on quality. The bursts in demand for the traces under study were often larger than the headroom remaining on a server regardless of the chosen threshold level. The underload factors had a much bigger impact on quality. They guide consolidation. Lower threshold values limit consolidation so that the same applications use more servers and violations are less likely. Table 3 gives the results of an ANOVA for the factors regarding the CPU capacity. Factor c, the memory threshold for defining underloaded servers, has an even larger impact on capacity than on CPU violation penalty. Again, factors d and c and their interaction explain nearly all, 97%, of the variation. Recognizing underload conditions is

a statistically significant impact on the metric and to quantify the impact. The assumptions of an ANOVA are: the effects of factors are additive; uncontrolled or unexplained experimental variations, which are grouped as experimental errors, are independent of other sources of variation; variance of experimental errors is homogeneous; and, experimental errors follow a Normal distribution. The model we employ for the CPU capacity and CPU violation penalty metrics is illustrated with the following equation: Metric ¼ l þ ai þ bj þ ck þ dl þ ðabÞij þ ðacÞik þ ðadÞil þ ðbcÞjk þ ðbdÞjl þ ðcdÞkl þ e i 2 f80; 85; 90; 95; 99g; j 2 f80; 85; 90; 95; 99g; k 2 f30; 40; 50; 60g; and l 2 f30; 40; 50; 60; 70; 80g

In the model a; b; c, and d correspond to factors CPU and memory overload threshold and CPU and memory underload threshold, respectively. The model states that a metric, e.g., CPU violation penalty, is equal to a mean value over all experiments l plus an effect that is due to each factor’s level plus an effect that is due to pair-wise interactions for factors, e.g., the ith threshold for CPU overload and the kth threshold for CPU underload. The error term e includes the effects of higher level interactions, i.e., three and four factor interactions. For the ANOVA models we consider, the factors have an additive impact on the metrics not a multiplicative impact. The experiments are fully controlled, experimental errors are defined as higher order interactions, which from a detailed analysis have small effects. Visual tests suggest that the errors are homogeneous. The Normality assumption for errors is discussed next. For the CPU violation penalty the results of a Kolmogorov–Smirnov test (K–S test) [18] for the 600 errors resulted in a D-value of 0.0627, which concludes that the error values are Normally distributed with a ¼ 0:01. However, after removing the 10 largest of the 600 errors, the K–S test indicates that the remaining 590 errors are Normally distrib-

Table 2 ANOVA table for CPU violation penalty. Source

SS

SS in %

df

MS

F-Value

Crit.

Conclusion

a

bd bc dc

32209682 442121 3952996127 9077000224 2997479 20512291 37595875 701391 2969998 5653318689

0.17 0 20.96 48.13 0.02 0.11 0.2 0 0.02 29.98

4 4 3 5 16 12 20 12 20 15

8052420 110530.3 1317665376 1815400045 187342 1709358 1879794 58449.2 148500 376887913

50.17 1 8209.6 11310.6 1.167 10.65 11.712 0.3 0.925 2348.2

2.39 2.39 2.623 2.232 1.664 1.772 1.592 1.772 1.592 1.687

Significant Not significant Significant Significant Not significant Significant Significant Not significant Not significant Significant

Error

78325016

0.41

488

160502.1

Total

18859068891

100

599

b d

c ab ad ac



Table 3 Analysis of variance table for CPU capacity. Source

SS

SS in %

df

MS

F-Value

Crit.

Conclusion

a

bd bc dc Error

18451760914 203448728 93715380384 4.34742E+11 101095635 299339472 3502757297 66662589 462381183 73166116866 1332165080

2.95 0.03 14.97 69.44 0.02 0.05 0.56 0.01 0.07 11.69 0.21

4 4 3 5 16 12 20 12 20 15 488

4612940228 50862182 31238460128 86948493051 6318477 24944956 175137865 5555216 23119059 4877741124 2729846

1689.8 18.632 11443.3 31851.1 2.315 9.138 64.157 2.035 8.469 1786.8

2.39 2.39 2.623 2.232 1.664 1.772 1.592 1.772 1.592 1.687

Significant Significant Significant Significant Significant Significant Significant Significant Significant Significant

Total

6.26044E+11

100

599

b d

c ab ad ac

Suboptimal Combinations Pareto-Optimal Combinations Chosen Experiments

CPU Violation Penalties Per Hour

12 10 8 6 4 2 0 1

1.1

1.2

1.3 1.4 Normalized Capacity

1.5

1.6

1.7

Fig. 11. Chosen combinations of thresholds for further experiments.

clearly an important aspect of policy for managing resource pools. The results of the 600 simulations are shown in Fig. 11 as small black dots. The figure illustrates CPU violation penalties versus normalized CPU capacity required under different policy configurations. Normalized capacity is defined as the sum of the total server CPU hours used and idle CPU hours divided by the sum of the total server CPU hours used and idle CPU hours used for the ideal case with a 4 h workload placement interval. Each of the 600 simulations represents a combination of factor levels. As expected, the figure shows that as workloads are consolidated more tightly and capacity is reduced there is an increase in the CPU violation penalties. The specific shape of this curve is workload and resource pool specific and reflects the variability in workload demands. A Pareto-optimal set of simulation runs is illustrated in Fig. 11 using a red line. These combinations of factor levels provided lowest CPU violation penalties and/or the lowest normalized CPU capacity. Ten of the Pareto-optimal combinations are chosen representing the best behaviours of the migration controller and serve as a baseline for the remainder of the paper. A data centre operator could choose any one of these as a best behaviour depending on the quality versus capacity trade-off desirable for the data centre’s

workloads. Migration controller thresholds for the 10 cases are given in Table 4.

4.5. Performance, quality, and power achieved by management policies We now consider the impact of integrated workload placement and workload migration controller policies for managing the resource pool. The management policies are described in Section 2.3. The management policies

Table 4 Migration controller thresholds for 10 Pareto-optimal cases.

a (%)

b (%)

c (%)

d (%)

99 90 99 90 90 99 85 99 99 99

99 95 99 95 90 95 99 95 90 99

60 60 50 50 50 40 40 40 30 40

80 80 80 80 70 80 80 60 80 40



are each simulated for the 10 Pareto-optimal sets of migration controller threshold values as illustrated in Fig. 11. Fig. 12 through Fig. 14 show simulation results for our baseline cases and the workload management policies we consider. The CPU metrics are discussed first followed by the memory and migration metrics. The use of a workload migration controller alone policy MC is most typical of the literature [19,20]. The MC policy does very well as a starting point. Fig. 12 shows that when using approximately 8% more CPU capacity than the ideal case there is a CPU violation penalty per hour of 12.6. As the migration controller becomes less aggressive at consolidating workloads, i.e., using 50% more CPU capacity than the ideal case, the penalty drops to nearly zero. The workload placement controller policy WP does not use the migration controller. It operates with a control interval of 4 h and consolidates to a given CPU and memory utilization, which is varied between 75% and 100%. In Fig. 12 the 100% is omitted for visibility reasons. It incurred hourly CPU violation penalties of 84. The WP policy does well when the systems are overprovisioned because there is little likelihood of a CPU violation penalty. As the workloads become more consolidated, the CPU violation penalty per hour increases dramatically.

The MC + WP policy is able to achieve much better CPU quality than either MC or WP alone while using much less CPU capacity. The periodic application of the workload placement controller globally optimizes the CPU usage for the resource pool. The migration controller alone does not attempt to do this. This policy and subsequent policies permit the workload placement controller to consolidate workloads onto servers using up to 100% CPU and memory utilization. The MC + WP on Demand policy invokes the workload placement controller to consolidate the workloads whenever the resource pool is lightly loaded. It behaves better than MC alone but not as well as MC + WP because it does not periodically provide for a global optimization of CPU usage for the resource pool. Finally, MC + WP + WP on Demand provides very good results from both a capacity and violation penalty point of view. It achieves nearly ideal CPU violation penalties while requiring only 10–20% more CPU capacity than the ideal case. We also note that the violation penalties for this approach are less sensitive to migration controller threshold values. Fig. 13 shows the capacity versus quality trade-off for the memory metric. All of the cases provide for very low

CPU Violation Penalties Per Hour

25

MC WP MC + WP MC + WP on Demand MC + WP + WP on Demand

20

15

10

5

0 1

1.1

1.2


1.5

1.6

1.7

Fig. 12. Comparison of different management policies regarding CPU.

Memory Violation Penalties Per Hour

3.5

MC WP MC + WP MC + WP on Demand MC + WP + WP on Demand

3 2.5 2 1.5 1 0.5 0 1

1.1

1.2


1.5

1.6

Fig. 13. Comparison of different management policies regarding memory.

1.7



Average Migrations Per Hour

350 300 250 200 150 100 50 0

MC

WP

MC + WP

MC + WP on Demand

MC + WP + WP on Demand

Management Policy Fig. 14. Number of migrations for all 10 chosen MC threshold cases.

1.6

Nomalized Capacity CPU Violation Penalties Migrations

116.6

12

10 Normalized Capacity

1.4 1.2

8

1 6 0.8 0.6

4

0.4 2 0.2 0

CPU Violation Penalties / Migrations per Hour

1.8

0 MC

MC + WP + MC + WP + MC + WP + MC + WP + WP On Demand WP On Demand WP On Demand WP On Demand