Application-Driven Dynamic Vertical Scaling of Virtual ... - CiteSeerX

1 downloads 0 Views 951KB Size Report
means to dynamic vertical scaling of VMs in order for the virtual applications to ... kept either at default values (zero reservation and no limit), or at static values ...
Application-Driven Dynamic Vertical Scaling of Virtual Machines in Resource Pools Evgenia Smirni

Lei Lu, Xiaoyun Zhu, Rean Griffith, Pradeep Padala, Aashish Parikh and Parth Shah

College of William and Mary Williamsburg, VA, USA [email protected]

VMware, Palo Alto, CA, USA {llei, xzhu, rean, padalap, aashishp, pashah}@vmware.com

Abstract—Most modern hypervisors offer powerful resource control primitives such as reservations, limits, and shares for individual virtual machines (VMs). These primitives provide a means to dynamic vertical scaling of VMs in order for the virtual applications to meet their respective service level objectives (SLOs). VMware DRS offers an additional resource abstraction of a resource pool (RP) as a logical container representing an aggregate resource allocation for a collection of VMs. In spite of the abundant research on translating application performance goals to resource requirements, the implementation of VM vertical scaling techniques in commercial products remains limited. In addition, no prior research has studied automatic adjustment of resource control settings at the resource pool level. In this paper, we present AppRM, a tool that automatically sets resource controls for both virtual machines and resource pools to meet application SLOs. AppRM contains a hierarchy of virtual application managers and resource pool managers. At the application level, AppRM translates performance objectives into the appropriate resource control settings for the individual VMs running that application. At the resource pool level, AppRM ensures that all important applications within the resource pool can meet their performance targets by adjusting controls at the resource pool level. Experimental results under a variety of dynamically changing workloads composed by multi-tiered applications demonstrate the effectiveness of AppRM. In all cases, AppRM is able to deliver application performance satisfaction without manual intervention.

I.

I NTRODUCTION

Translating an application’s service level objective (SLO) to system level resource requirements is a well-known, difficult problem, due to the distributed nature of most modern applications, the dependency on multiple resource types, and the time-varying demands faced by the applications. There has been a great deal of research tackling this problem in order to develop auto-scaling techniques that dynamically adjust resource allocations to applications at runtime. Among these, horizontal scaling refers to changing the number of servers (physical or virtual) hosting an Internet service or a multi-tier application [1], [2], [3], and vertical scaling refers to adjusting the effective “sizes” of individual servers [4], [5], [6]. Dynamic vertical scaling was made feasible by virtualization technologies. Most modern hypervisors such as VMware ESX [7], Xen [8], and Microsoft Hyper-V [9] offer a rich set of resource control primitives at the virtual machine (VM) level, which can be used to scale the amount of resources allocated to individual VMs. For example, ESX offers reservation, limit, c 2014 IEEE 978-1-4799-0913-1/14/$31.00

RootRP

Premium class

RP1

RP2

vApp1

Web

App

Best-effort class

vApp2

DB

Web

App

DB

VM1

VM2

Fig. 1: Two RPs hosting two multi-tier vApps and two VMs. and shares for both CPU and memory, and Hyper-V provides reserve, limit, and relative weight for CPU, and Startup RAM, Maximum RAM, buffer, and weight for memory. In spite of the proliferation of virtualized systems and the abundant research on VM vertical scaling techniques, the implementation of such techniques in commercial products remains limited. Most virtual machines deployed in the real world have their resource configurations determined by offline capacity planning [10]. At runtime, their resource settings are kept either at default values (zero reservation and no limit), or at static values set by administrators based on load testing results or best practices [11]. The latter approach is laborious and error-prone and cannot adapt to variations in workloads or system conditions. In this paper, we focus on the additional management components that are required to support dynamic vertical scaling of VMs in real world virtualization software, which, we believe were missing from earlier research. For this purpose, we need to look beyond resource management within a single hypervisor. For example, the VMware Distributed Resource Scheduler (DRS) [12] offers an abstraction of a resource pool (RP), a logical container that represents an aggregate resource allocation for a collection of virtual machines. Multiple resource pools can be organized in a tree hierarchy, where a root resource pool represents an aggregation of resources from a DRS-managed host cluster, and each child resource pool may encapsulate the resources delegated to a specific organization/department, or a certain set of applications. Resource control settings can be specified at both a VM and an RP level. DRS periodically divides the total capacity of a parent resource pool and distributes it to child VMs or RPs, according to the resource control settings and estimated resource demands of individual VMs and RPs. This allows resources to flow across VMs or RPs in order to achieve better resource utilization. (Please refer to [12] for a more detailed description of how resource pools work.)

Fig. 1 shows an example of a root resource pool containing two child resource pools, RP1 and RP2. Note that even though DRS allows up to eight levels in an RP tree, customer feedback shows that most users deploy only one level of resource pools under the root. We refer to a virtual application running on one or more VMs as a vApp. RP1 contains two multi-tier vApps, each containing web, application, and database tier VMs, and running business critical production workloads with SLOs. RP2 has two VMs running batch jobs that have less stringent performance requirements. To protect the high priority applications from their neighbors, an administrator can designate RP1 as the premium class and RP2 as the best-effort class, and use the resource controls at the RP or VM levels to achieve the following goals: • Resource guarantees: Provide a guaranteed amount of a certain resource to a specific service class or a vApp, even when this resource is over-committed. For example, a resource reservation (R) can be set on RP1 or its child VMs running the two important vApps. • Performance isolation: Prevent demand spikes within a lower service class from affecting others. For example, a resource limit (L) can be set on RP2 or its child VMs. • Proportional sharing: Allow multiple applications within the same resource pool to share capacities in proportion to their relative priorities. For example, the administrator can set higher shares (S) on VMs running vApp1 than on those running vApp2 to provide performance differentiation between the two under resource contention. Even though resource pools offer additional powerful knobs to control resource allocation, no prior work has studied automatic setting of resource controls at the RP level. In this paper, we present AppRM, a holistic solution that aims at providing service level assurance to vApps by dynamic vertical scaling of virtual machines within a resource pool hierarchy. AppRM is deployed on the VMware vSphere platform and automatically translates vApp-level SLOs into individual VMand RP-level resource control settings. To this end, AppRM employs a hierarchical architecture consisting of a set of vApp Managers and RP Managers. A vApp Manager determines the resource controls for a specific vApp and an RP Manager determines the resource controls for a specific resource pool. Each vApp Manager contains a model builder, an application controller, and a resource controller. Building upon earlier work in translating application-level SLOs to VMlevel resource requirements using feedback control and online optimization [5], we make the following new contributions in the overall design of AppRM: 1) We design a resource controller in each vApp Manager that computes the desired resource control settings for the individual VMs. Although most prior work utilizes only limits or shares for VM vertical scaling, AppRM specifically leverages dynamic adjustment of resource reservations as an effective knob to ensure guaranteed access to specified amounts of resources. 2) We design an RP Manager that takes desired VMlevel settings as inputs and computes the actual knob settings at both the VM and the RP levels, taking into account whether there is resource contention within the resource pool and specific RP-level resource configuration

options. Furthermore, the RP Manager interacts with its associated vApp Managers asynchronously to relax timing constraints on the vApp Managers. To the best of our knowledge, this is the first development of a holistic methodology that manages resource settings at both VM and RP levels. Our experimental results indicate that AppRM can achieve different targets for an application SLO in both under-provisioned and over-provisioned scenarios, in spite of dynamically-changing workloads. II. R ELATED W ORK Early studies of auto-scaling for distributed applications have largely focused on coarse-grained horizontal scaling of physical servers. In [1], cluster power management is done by dynamically matching server resources to aggregate resource demands to maximize global utility while minimizing power usage. Similarly, a reinforcement learning approach is employed in [2] to achieve globally optimal server allocations across multiple applications. The methodology in [3] extends such scaling to virtual machines by automatically determining the number of EC2 instances of a specific type in order to meet a target SLO for a multi-tiered web-application. Virtualization offers a new form of resource containers realized by modern hypervisors that provide a rich set of resource control primitives that can be used to vertically scale the amount of resources allocated to a virtual machine. These capabilities inspired another line of research on dynamic vertical scaling of VMs. In [6], VM-level CPU shares (aka. weights) were used to ensure the most critical applications achieve their performance goals in spite of resource contention from neighboring VMs. In [4], a two-level resource control architecture was presented, where a local controller estimates the amount of resource needed by each VM using a fuzzylogic-based modeling and prediction approach, and a global controller runs at each host (aka. node) to mediate the resource requests from different local controllers. Another two-level resource control system in [5] applies online statistical learning and adaptive control to translate the SLO of a multi-tiered application to the capacity requirements for multiple resource types (CPU and disk I/O) in multiple VMs, a similar approach we adopt in our model builder and application controller. In [13], an online scaling controller was designed to achieve better SLO violation rates by dynamic CPU capping and CPU hot-plugging. Nguyen et al. [14] proposed AGILE, a system to improve application performance by dynamically adjusting resource allocations or adding VMs when the application is overloaded. Most previous work [4], [5], [13], [14] only uses limits to enforce resource allocation at the VM level. We stress that our work differs from prior work in the following aspects: (1) No prior work used dynamic reservation for any resource type for achieving application SLOs, which is utilized in AppRM. We discuss in this paper why reservation is a powerful resource control knob that should be leveraged. (2) No prior work dealt with resource settings at the resource pool level, which is a unique resource allocation model offered by VMware DRS [12]. (3) Both the global controller in [4] and the node controller in [5] handle the resource requests from individual VMs at the same time and with the same frequency. The RP Manager in AppRM interacts with multiple vApp Managers asynchronously so that each vApp Manager can work at its own pace based on application need.

TABLE I: Notation A Ma R t u∗a,m,r ua,m,r

Fig. 2: AppRM at work across VMs in a single vApp.

pˆa pˆpred a α, β

Fig. 3: AppRM at work across vApps in a resource pool. III.

A RCHITECTURE

In Fig. 2, we show the architecture of AppRM that operates in the context of a single virtual application (vApp) to ensure that the vApp achieves its user-defined, application-level SLO. The App Sensor module collects application-level performance metrics such as throughput and response times for each vApp. Note that an application may require more than one VM, e.g., a multi-tiered application. We use the System Sensor module to measure and keep track of current resource allocations for all the VMs associated with the target vApp. These two sets of statistics are input to the Model Builder module that first constructs and then iteratively refines a model for the observed application performance as a function of the VMlevel resource allocations. The Application Controller module inverts this function to compute a new set of “desired” resource allocations in order to meet the user-defined application SLO. The Resource Controller module then determines a set of individual VM-level resource settings that would cause the VMs in the RP to acquire the desired resource allocations in the next control interval. Together, the Model Builder, Application Controller and Resource Controller modules constitute an instance of the vApp Manager for a single vApp. In Fig. 3, we show how AppRM manages multiple vApps sharing the same resource pool. The vApp Manager for each vApp submits the desired VM-level resource settings to the RP Manager, where an Arbiter module addresses potential resource conflicts within the RP and computes the actual values of VM-level and RP-level resource settings. These values are then set by the Actuator module, using the vSphere WebServices API [15] to communicate with the vCenter Server. IV.

D ESIGN

A. Sensors The sensor modules periodically collect two types of statistics: real-time resource utilizations of individual VMs and application performance. Resource utilization statistics over a time interval are collected by the system sensor through the vSphere Web Services API [15]. We collect the average perVM CPU utilization using the usage performance counter, and the per-VM memory utilization using the consumed

set of applications set of VMs in application a ∈ A, e.g., Ma = {vm1 , vm2 } set of resource types controlled, e.g., R = {cpu, mem} index for control interval desired allocation of resource type r in VM m of application a, 0 ≤ u∗a,m,r ≤ 1 measured allocation of resource type r in VM m of application a, 0 ≤ ua,m,r ≤ 1 normalized performance of application a, where pˆa = measured-performance / target-performance predicted normalized performance of application a model coefficient & coefficient vector, β = [β1 · · · βn ]T

counter. For application performance of each vApp, we measure the following metrics: throughput, average response time, and percentile response time. Although the current application sensor is implemented in the client workload generator, there are a number of commercial tools such as Hyperic [16] that can collect such metrics for a variety of applications. B. Model Builder The model builder is responsible for learning a model for the relationship between the application performance and its resource allocations based on real-time measurements. Although this relationship is often nonlinear and workloaddependent in most real-world systems, we adopt the online adaptive modeling approach in [5], where a linear model is estimated and periodically updated to approximate the underlying nonlinear relationship. We first define, in Table I, the key variables used in the model and the application controller. For application a ∈ A, we define the resource allocation variable ua (t) to be a vector that contains all measured resource allocations for application a during control interval t. For example, for an application running in two VMs (Ma = {vm1 , vm2 }), if two resources are considered (R = {cpu, mem}), then ua (t) is a vector where ua (t) = (ua,vm1,cpu , ua,vm1,mem , ua,vm2,cpu , ua,vm2,mem )T . In every control interval, the model builder recomputes the following auto-regressive-moving-average (ARMA) model that approximates an application’s normalized performance: pˆa (t) = α(t)ˆ pa (t − 1) + β T (t)ua (t).

(1)

The model is self-adaptive as its parameters α(t) and β(t) are re-estimated in each control interval. C. Application Controller The application controller essentially inverts the estimated model to compute the desired resource allocations for all the member VMs of the vApp in order for the vApp to meet its performance SLO. To this end, we apply the online optimization approach in [5] to designing the application controller. In this section, we briefly review the design of the optimal controller, and offer further insights into the intuition behind some key parameters. Specifically, the controller seeks the desired VM-level resource allocation vector, ua (t + 1), for the next control interval t + 1, that minimizes the following cost function: 2 2 J(ua (t+1)) = pˆpred (t + 1) − 1 +λ ||ua (t + 1) − ua (t)|| . a (2)

Here, pˆpred (t + 1) is the predicted value for the normalized a application performance in the next internal, using the model estimated in interval t (as in Eq. (1)), for a certain resource allocation vector ua (t + 1). More specifically, pˆpred (t + 1) = α(t)ˆ pa (t) + β T (t)ua (t + 1). a

(3)

The scaling factor, λ, captures the trade-off between the performance penalty for the application to deviate from its SLO target (i.e., normalized value equal to 1), and the stability cost that penalizes large oscillations in resource allocations. This leads to the following optimal resource allocations: u∗a,i (t + 1) = ua,i (t) +

1 − pˆpred (t + 1) a Pn βi , λ + i=1 βi2

(4)

where ua,i is the ith variable in the allocation vector ua . We make the following key observations: (1) When βi = 0, indicating no impact from the ith resource allocation variable on the application performance, then the ith resource allocation variable will see no change in the next control interval. (2) When βi > 0, indicating a positive correlation between the ith allocation variable and the performance value, if the model-predicted performance is below the target, i.e., (t + 1) < 1, then the ith resource allocation variable will pˆpred a be increased such that the performance value can be increased in the next interval; and the opposite is true if βi < 0 or if pˆpred (t + 1) > 1. (3) The scaling factor λ affects the amount a of resource allocation changes. As λ increases, the oscillation in each resource allocation variable is reduced. D. Resource Controller The goal of the resource controller is to translate the desired resource allocations computed by the application controller into desired VM-level resource control settings. Its design needs to answer the following two technical questions: 1) For each VM, which of the < R, L, S > settings should be used to ensure that the resource allocation in the next time interval can satisfy the requested value? 2) How to compute the values for the chosen resource control settings? As discussed in the Related Work, most prior work has used either limits/caps [4], [5] or shares/weights [6] for this goal. We argue that the use of reservation is the most effective way to achieve the same objective, especially for VMs running in a DRS cluster. Let us first explain how DRS determines CPU and memory allocation to individual VMs. For a V Mi with an ESX-estimated demand Di and resource control settings < Ri , Li , Si > for a specific resource, its final allocation is determined by a quantity named resource entitlement, Ei . DRS computes each VM’s entitlement as a complex function of the capacity of the root resource pool, and the demands and resource settings of all the VMs and RPs in the resource pool hierarchy. (Please refer to [12] for details of this computation.) While doing this capacity division, DRS ensures that the following constraints are satisfied: Ri ≤ Ei ≤ max(Di , Ri )

(5)

Note that for a VM with Ri = 0, if the ESX-estimated VM demand Di is lower than the VM’s real demand, then DRS cannot ensure that the VM’s actual demand is allocated.

Given this understanding, let us evaluate the three resource control settings individually: • Reservation By setting a VM’s reservation based on the desired resource allocation, AppRM can ensure that each VM receives at least the amount of resource required to meet the SLO of the respective vApp, even if the ESXestimated VM demand is insufficient. • Limit A VM’s limit setting can be useful in two scenarios: i) when there is resource contention in the parent resource pool; ii) to prevent one VM’s sudden demand spike from affecting the performance of its neighboring VMs. • Shares Even though a VM’s entitlement can also be shaped by its shares value, the absolute shares of a single VM is meaningless. In order to translate a desired level of resource allocation for a VM to its shares value, one needs to know the shares values of all the sibling VMs in the same resource pool. Because these VMs do not necessarily belong to the same vApp, this violates the “separation-of-concerns” design principle in AppRM where each vApp Manager should only be concerned with the resource settings of its managed VMs. Based on this comparison, we decide to use only reservation and limit in the resource controller of the vApp Manager to enforce the desired resource allocation for each VM, and leave the shares at the user-set values. Such a translation needs to consider the following two aspects: • The output of the application controller is in percentage units, whereas the reservation and limit values for both CPU and memory are in absolute units, i.e., megahertz (MHz) or megabytes (MB). • We explicitly allocate more resources than the computed values as a “safety buffer”, to deal with inaccuracies in the computed optimal allocations. The pseudo-code in Algorithm 1 summarizes the algorithm applied to every VM within the same vApp, for both CPU and memory resources. The algorithm calculates the resource capacity based on the specific resource type (line 2-5). The desired resource reservation is computed by multiplying the optimal value and the capacity value (line 6). The “safety buffer” size is determined by the reservation, the normalized performance, and a precomputed constant value delta (line 8). We set delta to a low or high value depending on whether the measured application performance is below or above the target (line 7). When the performance is better than the SLO (perf < 1), a relatively small buffer size can reduce the performance fluctuation around its target. When the performance is worse than the SLO (perf > 1), a relatively large buffer size is needed to improve the performance convergence rate. We set low = 0.1 and high = 0.3 empirically. The resource limit is set to the sum of reservation and buffer size (line 8). The nonzero, adaptive buffer between limit and reservation allows the resource scheduler to adjust runtime allocations if needed. The limit is then compared against the available capacity and the minimum capacity a VM needs to ensure that the final value is feasible (line 9-13). E. Resource Pool (RP) Manager After a vApp Manager computes the desired reservation and limit values for its member VMs, it submits these values

to be the sum of the requested VM-level reservations, without setting reservations or limits at the VM level. The RP-level limit is also increased if exceeded. • ProportionalThrottling For a resource pool whose capacity cannot be changed, the user can set modifiable=false, then the RP Manager throttles the VM-level reservations in proportion to either the requested reservation or the shares of each VM1 .

Algorithm 1: Calculate desired Reservation and Limit

1 2 3 4 5

input : optimal allocation u∗ , resource type type, and normalized performance perf (i.e., pˆa ) output: Reservation and Limit value pair ∗ if u < 0 then u∗ ← 0; if u∗ > 1 then u∗ ← 1; capacity ← 0; if type = CPU then capacity ← getNumVirtualCPUs( ) ∗ getCPUMHz( ); else if type = MEM then capacity ← getMemoryMB( );

13

resv ← u∗ ∗ capacity; delta ← 0; if perf < 1 then delta ← low else delta ← high; buf f er ← delta ∗ perf ∗ resv; limit ← resv + buf f er ; if limit > capacity then limit ← capacity; if type = CPU then limit ← max(MINCPU, limit); else if type = MEM then limit ← max(MINMEM, limit);

14

return < resv, limit >;

6 7 8 9 10 11 12

in an allocation request to the RP Manager that manages the resource pool containing these VMs. The design of the RP Manager needs to solve two technical problems: 1) Given the requested VM-level settings and the current RP-level settings, how to compute the actual VM-level and RP-level settings for the next interval? 2) How does the RP Manager interact with individual vApp Managers that work independently of one another? The answer to the first question is straightforward when the total reservation requested by the child VMs is less than the RP-level reservation, in which case, the RP Manager honors each vApp Manager’s requests. In the following discussion, we only address the resource contention scenario where the sum of VM-level reservations for a resource type is greater than the RP-level reservation. Under such conditions, we design the behavior of the RP Manager to be controllable using the following two RP-level configuration options: • expandableReservation: Option provided by DRS (expandable in short). When it is set to true, DRS automatically increases the RP’s CPU or memory reservation when it is exceeded by the sum of the children’s reservations. • modifiableReservation: Additional option provided by AppRM (modifiable in short). When it is set to true, the RP Manager proactively increases the RP’s CPU or memory reservation on demand to satisfy the total reservation requested by the child VMs.

TABLE II: Resource contention resolution methods modifiable=true modifiable=false

Next, we describe in detail how the RP Manager handles the allocation request from one vApp Manager periodically, which contains a collection of 4-tuples - VM name, resource type, requested reservation and requested limit: (vmN ame, resT ype, reqV mResv, reqV mLimit). Once receiving the request tuples for all the VMs in the member set M of the application, the RP Manager runs Algorithm 2 to calculate the actual VM-level and RP-level settings for a given resT ype (omitted in the algorithm). Algorithm 2: Calculate actual VM and RP settings

1 2 3 4 5 6 7 8 9

11 12 13 14

• SetVmOnly This is used when the resource pool being managed is expandable. The RP Manager only sets the VM-level reservations and limits as requested by the vApp Manager, without changing the RP-level settings. DRS then expands the RP-level reservation accordingly to accommodate the total demand for the resource pool, while leaving the RP-level limit unchanged. • SetRpOnly For a resource pool with expandable=false, the user can set modifiable=true for this RP in AppRM, to allow resources to flow from the sibling RPs to this RP in order to satisfy the demand of its child vApps. In this case, the RP Manager modifies the RP-level reservation

expandable=false SetRpOnly ProportionalThrottling

Given that a single resource pool can contain multiple performance-critical applications, each represented by a vApp Manager that acts on its behalf, we specifically design the RP Manager to process the allocation requests from different vApp Managers asynchronously. Instead of waiting to collect inputs from all the vApps and processing them all, the RP Manager processes the request from each vApp as it arrives. Such a scheme also allows the vApp Mangers under the same RP Manager to have different control intervals based on the needs of different applications.

10

The RP Manager employs the following three strategies for handling resource contention under different configuration scenarios (as summarized in Table II):

expandable=true SetVmOnly SetVmOnly

15 16 17 18 19

20 21

Input : vApp Manager request tuples for all member VMs: (vmN ame, resT ype, reqV mResv, reqV mLimit); Output: Actual VM and RP level settings for resT ype for the next interval: (vmN ame, vmResv, vmLimit), (rpResv, rpLimit) rpResvAvail ← getRpAvailReservation(resT ype); rpResvU sed ← getRpReservationUsed (resT ype); rpResv ← rpResvU sed + rpResvAvail; rpLimit ← getRpLimit (resT ype); for vmN ame ∈ M do vmResv ← getVmReservation(vmN ame, resT ype); P curAppResv ← P M vmResv; reqAppResv ← M reqV mResv; rpT otReqResv ← rpResvU sed − curAppResv + reqAppResv; if rpT otReqResv ≤ rpResv then /* no contention */ for vmN ame ∈ M do vmResv ← reqV mResv; goto 20; if expandable then /* SetVmOnly */ for vmN ame ∈ M do vmRerv ← min(reqV mResv, rpResv); else if modif iable then /* SetRpOnly */ rpResv ← rpT otReqResv; rpLimit ← max(rpLimit, rpResv) else /* ProportionalThrottling */ for vmN ame ∈ M do vmResv ← rpResv ∗ reqV mResv/rpT otReqResv for vmN ame ∈ M do vmLimit ← reqV mLimit; return (vmN ame, vmResv, vmLimit), (rpResv, rpLimit)

It first queries the vCenter Server for the currently used and available RP reservation values and computes the current RP 1 Due to space limit, only the result from the reservation-based throttling is shown in the next evaluation section.

reservation (lines 1-3). It also queries for the current RP limit (line 4) and the current VM reservation values (lines 5-6). It then computes the current and the total requested reservation for the vApp (lines 7-8), as well as the total requested reservation for the resource pool (line 9). If the current RP reservation can satisfy the total requested reservation, the per-VM requests are granted (line 10); Otherwise, it moves on to resource contention handling. If the RP is expandable, the SetV mOnly policy is used, subject to the constraint that individual VM reservations cannot exceed the RP reservation (lines 11-12). If the RP is not expandable but modifiable, the RP Manager adopts the SetRpOnly strategy by increasing the RP reservation just enough to satisfy the total requested reservation for the vApp (lines 14-16). If, however, the RP is neither expandable nor modifiable, the P roportionalT hrottling approach is used where the RP reservation is allocated proportionally to the requesting VMs (line 17-19). Finally, the requested VM limit is granted (line 20), and the computed VM-level and RP-level settings are returned. The actuator module sets the resource (CPU or memory) reservation and limit values of virtual machines and resource pools through the vSphere Web Service API [15], The resource reconfiguration of a virtual machine is done by one explicit call of the reconfigVM_task method. V.

P ERFORMANCE E VALUATION

In this section, we present experimental results that demonstrate the effectiveness of AppRM. The experiments are specifically designed to evaluate the following goals: 1) Meeting an application’s SLO defined in different performance metrics including mean response time, throughput, and 95th percentile response time; 2) Automatically detecting and mitigating dynamicallychanging workload demands; 3) Enforcing performance targets under competing workloads. We chose MongoDB [17] as a benchmark application as it is representative of a modern distributed data processing application. As shown in Fig. 4(a), we set up a MongoDB (ver. 1.8.1 Linux 64-bit) cluster consisting of 3 VMs. VM1 and VM2 are MongoDB shards running mongod instances. VM3 runs a mongos instance, which balances load and routes queries to the shards. The MongoDB application defines two types of transactions, each of which can be generally classified as a “read” or “write” operation. For generating time-varying workloads, we use Rain [18] for workload generation. Rain provides the ability to generate variable amounts of load in multiple patterns with different mixes of operations. Here we assume that the workload is defined by two characteristics: the number of clients and the percentage of read/write operations. Similar to the MongoDB VMs, both Rain and AppRM are also run in separate VMs to ensure performance isolation. Each of these VMs has been configured with two vCPUs and a memory size of 4GB. All VMs run Linux Ubuntu 2.6.35 as their guest operating system. The testbed consists of two separate ESXi 5.0 virtualized hosts: ESX1 and ESX2. The Rain VM(s) and the AppRM VM are hosted on ESX1, and all the MongoDB VMs are hosted on ESX2, as shown in Fig. 4. The full host configuration is shown in Table III.

TABLE III: Configuration of hosts Host Model CPU

ESX1 HP ProLiant BL460c G7 Intel Xeon CPU X5650 24 cores @ 2.10GHz 128 GB DGC Fibre Channel Disk

Memory Storage

ESX2 HP ProLiant BL465 G7 AMD Opteron 6172 12 cores @ 2.67 GHz 96 GB DGC Fibre Channel Disk

A. Scenario 1: Achieving different performance targets We use the setup shown in Fig. 4(a), where the physical node ESX2 hosts one instance of the MongoDB application of three VMs (Mongos, Shard1, Shard2). This set of experiments allows us to gain insight into the system behavior and to validate the internal workings of the vApp Manager. ESX 1 ESX 1

ESX 2

ESX 2 Premium RP Shard 1

Rain

Rain workload generator 1

Mongos

Shard 1

AppRM 1 AppRM 2

Shard 1

Shard 2

Rain workload generator 2

Mongos

Shard 2

Mongos

AppRM

Shard 2

(a)

Competing VM 1 Competing VM 2 Competing VM 3 . . .

Competing VM 10

(b)

Fig. 4: Experimental setup with a MongoDB cluster and the Rain workload generator

We run Rain with 300 threads emulating 300 concurrent clients connecting to the MongoDB server. The workload is composed of 50% read and 50% write requests. For each emulated client, there is no think time between receiving the last reply and sending the next request. To identify reasonable performance targets for the application, we run profiling experiments for 300 concurrent clients with different levels of CPU and MEM allocations, ranging from unlimited to very limited amounts of resources. The results show that the mean response time of the end user ranges from 176ms to tens of seconds (with a large number of network connection exceptions). In all the following experiments, we choose our performance targets within the low response time range. In the first experiment, we set the target mean response time to 300 milliseconds, and evaluate the system under two initial resource settings: under-provisioned2 and overprovisioned3 . In the under-provisioned scenario, the initial resource control settings of the MongoDB VMs are inadequate to meet the application demand, thus resulting in high response times. In the over-provisioned case, the initial VM resource settings are more than the application demand, leading to inefficient resource utilization. Fig. 5 shows how the normalized mean response time in both cases changes as a function of time. Note that the “white” area represents the period of model learning, during which only the sensor module is active. All AppRM modules are activated in the “gray” area, during which the system model is periodically updated and the resource control settings are dynamically adjusted. We can see that AppRM allows the application to meet its performance target in both under-provisioned and over-provisioned scenarios. 2 all VMs are set to R cpu = Rmem = 0, Lcpu = Lmem = 512 (MHz/MB) 3 all VMs are set to R cpu = Rmem = 0, Lcpu = Lmem = Unlimited

target over−provisioning under−provisioning

10

TABLE IV: Definition of two changing workloads Target mean RT 600ms 800ms

1

0

10

20

30

40

50

60

70

80

2 Relative perf

Fig. 5: Measured mean response time (tgt.=300 ms)

0.15

0.15

MEM utilization

CPU utilization

Fig. 6 shows the resource utilization changes for all the MongoDB VMs with under-provisioned initial settings. It is apparent that for both CPU and memory, the initial allocations are insufficient and after AppRM is activated, the resource allocations are increased to meet the performance target. 0.2

0.1 0.05 0

0 10 20 30 40 50 60 70 80 90 Control intervals (every 1 min.) Shard2 (b)

0.5 0.4

MEM utilization

0.3 0.2 0.1 0

0.3 0.2

0

0 10 20 30 40 50 60 70 80 90 Control intervals (every 1 min.) Shard2 (b)

Fig. 7: Resource utilization in the over-provisioned case Can AppRM support other performance targets, especially percentile response times, which are more bursty and hence harder to control? Fig. 8(a) and 8(b) show the results for similar experiments in the over-provisioned case but for different performance targets, i.e., the 95th -percentile of response time and throughput, respectively4 . AppRM successfully adjusts the resources settings to meet these targets. target 95th−resp time

1.5

Relative perf

Relative perf

2

1 0.5 0

0

10

20 30 40 50 60 70 80 Control intervals (every 1 min.)

90

12 10 8 6 4 2 0

10

20 30 40 50 60 70 Control intervals (every 1 min.)

(a) Measured 95th percentile

(b) Measured throughput

response time (tgt.=2000 ms)

(tgt.=50,000 reqs/s)

1 0.5 0

20 40 60 80 100 Control intervals (every 1 min.)

1

0.1

120

target

0

20

40 60 80 100 120 140 Control intervals (every 5 min.)

(a) Measured mean response

(b) Measured mean response

time (tgt.=600 ms)

time (tgt.=800 ms)

80

Fig. 8: Different performance targets B. Scenario 2: Mitigating dynamically-changing workloads In this scenario, we evaluate the effectiveness of AppRM in meeting the target SLO under dynamically-changing work4 The 95th percentile is used as an example. Other percentiles can be used if they are also available and exposed.

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

160

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

20 40 60 80 100 Control intervals (every 1 min.) Mongos (a)

120 Shard1

0

0

20 40 60 80 100 120 Control intervals (every 1 min.) Shard2 (b)

Fig. 10: Per-VM resource utilizations (matching Fig. 9(a))

target throughput

0

1 min. 5 min.

In the first experiment, Fig. 9(a) shows that the application is over-provisioned initially (Rcpu = Rmem = 0, Lcpu = Lmem = Unlimited) resulting in very low response times. AppRM gradually reduces its resources allocations (see Fig. 10) while maintaining the mean response time near its target. At interval 61, the workload increases to 500 clients and becomes read-intensive (with 80% of read operations). This change does not cause a significant increase in response times and thus all VMs’ resource utilizations are relatively stable.

0.1 0 10 20 30 40 50 60 70 80 90 Control intervals (every 1 min.) Mongos Shard1 (a)

10

target

CPU utilization

CPU utilization

For the over-provisioned case, in Fig. 7, AppRM reduces the CPU allocations of the MongoDB VMs but hardly changes the memory allocations. This is due to the design of the application controller that aims to meet the performance target with minimal changes in the resource allocations. 0.4

Interval

loads. Table IV defines the two experiments with changing workloads and the workloads start with 300 clients and a 50%50% read/write ratio. Here, we intentionally show different target values across experiments to demonstrate AppRM’s robustness within a dynamic environment. Additionally, in the second experiment and Scenario 3, we use 5 minutes interval to demonstrate that AppRM also works correctly for different intervals.

Fig. 6: Resource utilization in the under-provisioned case

0.5

Period 2 read/write mix 80r/20w 20r/80w

Fig. 9: Dynamic workload: changing intensity and mix

0.05 0

→ →

1.5

0

0.1

0 10 20 30 40 50 60 70 80 90 Control intervals (every 1 min.) Mongos Shard1 (a)

clients 500 500

90

Control intervals (every 1 min.)

0.2

Period 1 read/write mix 50r/50w 50r/50w

Relative perf

0.1

clients 300 300

MEM utilization

Relative perf

100

90

In the second experiment, Fig. 9(b) shows that in Period 1 (0-90), the application performance is initially up to 10 times worse than its target, due to the initially under-provisioned resource settings (Rcpu = Rmem = 0, Lcpu = Lmem = 512 (MHz/MB)). At interval 23, AppRM is activated and it increases both CPU and memory allocations (see Fig. 9(b)), bringing the measured performance down to its target. In Period 2 (91-150), 200 more threads are added to the workload client and the workload is dominated by write operations (80% of operations are writes). This sudden workload increase initially degrades the application performance, up to 50% worse than the target (see Fig. 9b and notice the y-axis is in log scale). AppRM rapidly responds to this change and correctly re-adjusts the allocations. Note that both CPU and memory utilizations of the MongoDB VMs are increased (see Fig. 11), to mitigate the intensified workload while meeting the target.

0.4 0.35

0.3

0.3

MEM utilization

0.25 0.2 0.15 0.1 0.05

0.25 0.2 0.15 0.1 0.05

0

20 40 60 80 100 120 140 160 Control intervals (every 5 min.) Mongos

0

Shard1

0

20 40 60 80 100 120 140 160 Control intervals (every 5 min.) Shard2

(a)

(b)

Fig. 11: Per-VM resource utilizations (matching Fig. 9(b))

2

1 0.5 0

In this scenario, we focus on evaluating AppRM under resource contention, using the testbed setup shown in Fig. 4(b). We use ESX2 to emulate the capacity of a root resource pool including a premium class resource pool of priority applications and a best-effort class of 10 individual VMs running competing workloads (similar to Fig. 1). The premium class RP contains two instances of the MongoDB application (a total of six VMs). Each VM in the best-effort class is running a CPU-intensive workload capable of consuming 100% of its allocated capacity, creating a resource contention in the root resource pool. We run several experiments with different resource pool settings, using a mean response time target of 600ms for both MongoDB applications. We demonstrate the RP Manager’s behaviors and its results under the three strategies summarized in Table II. It shows that with the ongoing contention from those best-effort VMs, AppRM can help both MongoDB applications achieve their performance targets under different resource pool configuration options. 1) SetVmOnly: This strategy is applied to an expandable resource pool whose reservation is expanded automatically by DRS to meet the aggregate reservation of all child VMs. We use the following initial settings in experiment: for each VM in premium RP (Rcpu = Rmem = 0, Lcpu = Lmem = Unlimited) and for premium RP itself (Rcpu = Rmem = 0, Lcpu = Lmem = Unlimited). Fig. 12(a) shows that the initial CPU allocations based on shares only are inadequate to meet the SLOs. By setting the VM-level reservations properly and allowing RP-level reservation to expand by itself, AppRM can meet the SLOs of both MongoDB applications. Fig. 12(b) shows the increase in both the vApp-level and the RP-level reservation values.

0

10

20 30 40 50 60 70 Control intervals (every 5 min.)

80

90

(a) Measured perf. (tgt.=600 ms)

4000 3500 3000 2500 2000 1500 1000 500 0

0

10

RP reservation(limit) 20 30 40 50 60 70 80 Control intervals (every 5 min.)

90

(b) RP level reservation and limit

Fig. 13: SetRpOnly strategy 3) Proportional Throttling: In this strategy, AppRM has to throttle the VM-level reservations in proportion to the requested values, because the premium class RP cannot satisfy the total reservation of all the VMs, and the RP is neither expandable nor modifiable. The initial settings: for each VM in premium RP (Rcpu = Rmem = 0, Lcpu = Lmem = Unlimited) and for premium RP (Rcpu = Rmem = 2000 (MHZ/MB), Lcpu = Lmem = 2000 (MHZ/MB)). Fig. 14 shows that AppRM cannot improve performance for the MongoDB applications because the premium class resource pool itself does not have sufficient resources and is not allowed to increase its size. AppRM scales back all VM-requested reservations proportionally and balances the performance degradation across both applications. 2 Relative perf

C. Scenario 3: Enforcing targets under competing workloads

target app1 app2

1.5

CPU (MHz)

0

applications are reduced to their respective targets after AppRM is activated. The reservation and limit of the premium class resource pool increase to proper values that guarantee the required CPU resources for both applications (see Fig. 13(b)). Note that individual VM settings are unchanged and unshown. Relative perf

CPU utilization

0.4 0.35

target app1 app2

1.5 1 0.5 0

0

10

20 30 40 50 60 70 Control intervals (every 5 min.)

80

90

Fig. 14: Proportional Throttling: Measured perf. (tgt.=600 ms)

CPU (MHz)

Relative perf

VI. C ONCLUSIONS In this paper, we have presented AppRM, a holistic performance management tool that scales virtual machines vertically by adjusting resource settings at the individual VM level or at 3000 the resource pool level. Our modeling and control techniques 3 2500 target 2.5 app1 can be applied to other resources as well such as storage 2000 app2 2 1500 1.5 and network, but we need similar resource pool abstractions 1000 1 500 and control knobs for those resources. For storage, there 0.5 0 0 0 10 20 30 40 50 60 70 80 90 are research prototypes for IOPS reservations and resource 0 10 20 30 40 50 60 70 80 90 Control intervals (every 5 min.) Control intervals (every 5 min.) pools [19], [20]. For network, though reservation for outgoing app1 app2 RP reservation (a) Measured mean response time (b) vApp and RP level aggregate bandwidth exists, there are no known implementation of network resource pools. As these mechanisms become available, (tgt.=600 ms) reservation values we will extend our work to these resources. Fig. 12: SetVmOnly strategy

2) SetRpOnly: In this strategy, the RP-level attributes are allowed to change at runtime (modifiable). The initial settings are: for each VM in premium RP (Rcpu = Rmem = 0, Lcpu = Lmem = Unlimited) and for premium RP (Rcpu = Rmem = 0, Lcpu = Lmem = Unlimited). Fig. 13(a) demonstrates that the response times of both

VII.

ACKNOWLEDGMENT

We thank the anonymous reviewers for their valuable feedback on this paper. This work was completed during Lei Lu’s internship at VMware. Evgenia Smirni is partially supported by the NSF grants CCF-0937925 and CCF-1218758.

R EFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7] [8]

[9] [10] [11] [12]

[13]

[14]

[15] [16] [17] [18]

[19]

[20]

J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle, “Managing energy and server resources in hosting centers,” in Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, ser. SOSP ’01. ACM, 2001, pp. 103–116. G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani, “A hybrid reinforcement learning approach to autonomic resource allocation,” in Proceedings of the 2006 IEEE International Conference on Autonomic Computing, ser. ICAC ’06. IEEE Computer Society, 2006, pp. 65–73. P. Bodık, R. Griffith, C. Sutton, A. Fox, M. Jordan, and D. Patterson, “Statistical machine learning makes automatic control practical for internet datacenters,” in Proceedings of the 2009 conference on Hot topics in cloud computing, 2009, pp. 12–12. J. Xu, M. Zhao, J. Fortes, R. Carpenter, and M. S. Yousif, “Autonomic resource management in virtualized data centers using fuzzy-logicbased approaches,” Cluster Computing Journal, vol. 11, 2008. P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, and A. Merchant, “Automated control of multiple virtualized resources,” in Proceedings of the 4th ACM European Conference on Computer Systems, ser. EuroSys ’09. ACM, 2009, pp. 13–26. S. Blagodurov, D. Gmach, M. Arlitt, Y. Chen, C. Hyser, and A. Fedorova, “Maximizing server utilization while meeting critical SLAs via weight-based collocation management,” in Proc. of International Symposium on Integrated Network Management, 2013. “VMware ESX and ESXi,” http://www.vmware.com/products/vsphere/esxi-and-esx/. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in Proceedings of the 19th Symposium on Operating Systems Principles (SOSP), ser. Operating Systems Review, vol. 37, 5. ACM, Oct. 19–22 2003, pp. 164–177. “Windows Hyper-V Server,” http://www.microsoft.com/hyper-v-server/. D.A. Mesance and V. Aldemia, Capacity Planning for Web Services: Metrics, Models, and Methods. Prentice Hall, 2001. VMware, Inc., vSphere Resource Management Guide: ESXi 5.1, vCenter Server 5.1, 2012. A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, and X. Zhu, “VMware distributed resource management: Design, implementation and lessons learned,” VMware Technical Journal, vol. 1, 2012. L. Yazdanov and C. Fetzer, “Vertical scaling for prioritized vms provisioning,” in Proceedings of the 2012 Second International Conference on Cloud and Green Computing, ser. CGC ’12. IEEE Computer Society, 2012, pp. 118–125. H. Nguyen, Z. Shen, X. Gu, S. Subbiah, and J. Wilkes, “Agile: Elastic distributed resource scaling for infrastructure-as-a-service,” in Presented as part of the 10th International Conference on Autonomic Computing. USENIX, 2013, pp. 69–82. “VMware vSphere Web Services SDK,” https://www.vmware.com/support/developer/vc-sdk/. “Hyperic,” http://www.hyperic.com/. “MongoDB,” http://www.mongodb.org. A. Beitch, B. Liu, T. Yung, R. Griffith, A. Fox, and D. Patterson, “Rain: A workload generation toolkit for cloud computing applications,” in U.C. Berkeley Technical Publications. A. Gulati, A. Merchant, and P. J. Varman, “mclock: Handling throughput variability for hypervisor io scheduling,” in Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’10. USENIX Association, 2010, pp. 1–7. A. Gulati, G. Shanmuganathan, X. Zhang, and P. Varman, “Demand based hierarchical QoS using storage resource pools,” in Proceedings of the 2012 USENIX conference on Annual Technical Conference, ser. USENIX ATC’12. USENIX Association, 2012, pp. 1–1.

Suggest Documents