Thesis Proposal:

Thesis Proposal: Dynamic optimization of power and performance for virtualized server clusters

Vinicius Petrucci September 2010

Supervisor: Orlando Loques (IC-UFF)

Institute of Computing Fluminense Federal University Niteroi, Rio de Janeiro, Brazil

Dynamic optimization of power and performance for virtualized server clusters Vinicius Tavares Petrucci

Thesis proposal submitted to the Graduate School of Computing of Fluminense Federal University as a partial requirement for the degree of Doctor in Science. Topic Area: Parallel and Distributed Computing.

Approved by:

Prof. Orlando Gomes Loques Filho, Ph.D. (IC/UFF) (Chair)

Prof. Julius Cesar Barreto Leite, Ph.D. (IC/UFF)

Prof. Claudio Luis de Amorim, Ph.D. (COPPE/UFRJ)

Prof. Rodrigo Martin Santos, Ph.D. (UNS, Argentina)

Niterói, September 2010.

iv

Abstract This proposal presents an approach for power and performance management in a platform running multiple independent network applications, such as web-based applications. The approach assumes a virtualized server cluster environment and includes an associated optimization model and strategy developed to dynamically configure the applications over the processor’s cluster. The optimization aims to reduce the cluster power consumption, while meeting the performance requirements of the applications. An underlying mathematical formulation for power and performance optimization is given by a mixed integer programming model solved periodically at run-time in a control loop fashion. Dynamic configuration techniques, such as CPU dynamic frequency/voltage scaling, server on/off switching, virtual machine on-demand deployment and live migration, are investigated in order to implement the overall optimization approach.

Contents 1

Introduction 1.1 Thesis proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Document organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Proposed approach 2.1 Optimization model . . . . . . . . . . . . . . . 2.1.1 Application workload balancing . . . . 2.1.2 Modeling switching and migration costs 2.2 Optimization control strategy . . . . . . . . . . 2.2.1 Execution example . . . . . . . . . . . 2.3 Target cluster architecture . . . . . . . . . . . . 2.4 Practical issues . . . . . . . . . . . . . . . . .

3

4

Preliminary work 3.1 Simulation environment . . . . . . . . . . . 3.2 Dynamic optimization execution . . . . . . 3.3 Energy savings . . . . . . . . . . . . . . . 3.4 Scalability considerations . . . . . . . . . . 3.5 Switching cost effects . . . . . . . . . . . . 3.6 Evaluation on VM allocation and migration

. . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

1 2 3 4

. . . . . . .

5 6 8 9 9 11 12 12

. . . . . .

14 14 15 16 17 20 21

Thesis status and future work 22 4.1 Limitations and potential improvements . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Research schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Bibliography

26

vi

Chapter 1 Introduction An increasing number of large server clusters are being deployed in data centers supporting many different Internet applications and services in a seamless, transparent fashion. A data-center is a large-scale distributed system that consists of hundreds or thousands of machines linked by a fast network. These architectures are becoming common in utility/cloud computing platforms [13, 22], such as Amazon EC2 and Google AppEngine. In these platforms, the applications are mostly hosted on several dedicated physical servers and can have different workloads which vary with time. These platforms may have great processing and performance demands, incurring in high energy costs and indirectly contributing to increase CO2 generation and then to the environmental deterioration [31]. The energy consumed for keeping today’s server systems running became a very important concern, which in turn, requires major investigation of techniques to improve the energy efficiency of their computing infrastructure [9, 17, 42]. According to a study by the Uptime Institute and McKinsey and Company [31], server clusters in data centers contribute to 30% of the world’s carbon-dioxide emissions and will surpass those of the airline industry by 2020. Other recent data center studies, such as [5], indicate that costs associated with power and cooling could overtake hardware acquisition costs. Thus, achieving power-efficient in today’s Internet server systems is a fundamental concern. In general, to allow hosting multiple independent network applications, today’s server cluster platforms rely on virtualization techniques to enable the usage of different virtual machines (VMs) — operating system plus software applications — on a single physical server. Server virtualization has been widely adopted in data centers around the world for improving resource usage efficiency; particularly helping to make these computing environments more energy-efficient. Several virtual machine monitors or hypervisors, which act as a layer between the virtual machine and the actual hardware, have been developed to support server virtualization (e.g, Xen [4] and VMware [19]). The adoption of virtualization technologies for power-aware optimization in server clusters turns out to be a challenging research topic. Server virtualization provides a means for server consolidation and allows for on-demand allocation and migration of these virtual machines, which run the applications, to physical servers [29, 52]. It is recognized that the dynamic consolidation of application workloads, through live migration of virtual machines, helps to increase server utilization, allowing to reduce the use of computer resources and the associated power demands [29, 48]. Specifically, the ability to dy1

namically move application workloads around in a virtualized server environment enables some physical machines to be turned off in periods of low activity, and when the demand increases, it allows for bringing them up back and distributing the application workloads across them. This presents an efficient way of running a data center from a power management point of view. Moreover, server on/off mechanisms combined with dynamic voltage and frequency scaling (DVFS) capabilities offered by current processors can provide even better power savings. Intel’s “Enhanced Speedstep Technology”, and AMD’s “PowerNOW!” are examples of DVFS. With DVFS technique the processor’s frequency, which is associated with the operating voltage, can be automatically adjusted at run time, decreasing the power consumption and reducing the amount of heat generated on the chip. Since DVFS enables the processor to change it’s operating frequency, it results in a corresponding change in performance due to reduction in the number of instructions the processor can issue in a given amount of time. This poses relevant power and performance trade-offs to consider in server systems, depending on the particular server hardware, the incoming application workload, and power and performance management goals. The optimization decisions for power and performance management are usually performed based on characteristics of the application workload, for example, in terms of average service arrival rate and resource utilization. In most real server clusters, the workloads of the applications vary over a period of time and little knowledge a-priori is known about future incoming requests. One approach to tackle this issue is to adopt predictive techniques. This plays an important role for improving the optimization decisions, guaranteeing a high quality of service for the applications in Internet server systems [46].

1.1

Thesis proposal

In this thesis proposal, we present an optimization solution for power and performance management in virtualized server clusters. We deal with the problem of selecting at runtime a powerefficient configuration and a corresponding mapping of the applications running on top of virtual machines to physical servers. The optimization decision also includes selecting the best voltage/frequency combination for each physical server. To scale the solution over time, considering that the applications have individual time-varying workloads, the optimization strategy enables the virtualized server system to react to load variations and adapt its configuration accordingly. The optimization decisions for dynamic configuration can be improved by leveraging predictions about future resource utilization and availability in the server cluster. The idea is take advantage of the correlation between the past history and the near future of the workload to make useful predictions about the incoming workload [15]. The optimization approach aims to provide an effective solution to integrate power and performance management in virtualized server clusters. To achieve that vision, we describe a mathematical formulation for minimizing the power consumed and meeting performance requirements in terms of a mixed integer programming model. In addition, we propose an optimization control strategy for dynamically configuring the cluster of virtualized server servers while leveraging the optimization model which is solved periodically in a control loop fashion. In order to fully implement the overall optimization approach, we need to investigate the use of dynamic configuration techniques, such as CPU dynamic frequency/voltage scaling, server on/off switch2

ing, virtual machine on-demand deployment and live migration. As for evaluation, we need to collect evidence that the overall approach is feasible and effective through simulations and experiments. In contrast to traditional power-aware optimization solutions that rely on heuristics, our approach adopts an exact optimization methodology, which is based on integer programming, for dynamic configuration of the server cluster. Solving the power optimization problem to optimality can be economically significant since a difference of 10% of power cost for a data center can be worth a lot of money per year for a business. In addition, as the optimization problem is intended to be solved at run-time and periodically, the optimization algorithm must have a timeconstrained processing requirement, considering the usual cluster configuration control period (e.g., seconds or few minutes). This control period represents a soft real-time constraint for the optimization algorithm.

1.2

Related work

In this section, we survey some of the relevant literature in the active research area of designing energy-efficient server clusters. Several techniques to reduce energy consumption on server environments have been developed over the recent years. A good survey on power management in server systems is presented in [9]. There are differences between the proposed optimization solution and the major works that precede it. Several optimization approaches, based on the bin packing problem, for configuring virtualized servers are described in the literature, such as [10, 27]. However, their models are not designed for power-aware optimization. In [54], the authors present a two-layer control architecture aimed at providing power-efficient real-time guarantees for virtualized computing environments. The work relies on a sound control theory based framework, but does not addresses dynamic virtual machine allocation or migration and machine on/off mechanisms in a multiple server context. A heuristic-based solution for the power-aware consolidation problem of virtualized clusters is presented in [48], but it does not guarantee to find solutions that are at least near to the optimal. The approach described in [16] determines predefined thresholds to switch servers on/off (given by simulation) based on CPU frequency values for the active servers that must match in order to change the cluster configuration to meet the performance requirements. However, their proposal does not provide an optimal solution as a combination of which servers should be active and their respective CPU frequencies. The problem of optimally allocating a power budget among servers in a cluster in order to minimize mean response time is described in [18]. In contrast to our approach, which is designed to minimize the cluster power consumption while meeting performance requirements, their problem poses a different optimization objective. A dynamic resource provisioning framework is developed in [29] based on lookahead control. Their approach does not consider DVFS, but the proposed optimization controller addresses attractive issues, such as switching machines costs (i.e., overhead of turning servers on and off) and predictive configuration model. Our approach also copes with switching costs and it includes the ability to incorporate prediction techniques in the optimization strategy to improve the configuration decisions. A power-aware migration framework for virtualized HPC (High-performance 3

computing) applications, which accounts for migration costs during virtual machine reconfigurations, is presented in [51, 52]. Similarly to our approach, it relies on virtualization techniques used for dynamic consolidation, although the application domains are different. Contrasting with [29, 48, 52], our approach takes advantage of dynamic voltage/frequency scaling (DVFS) mechanisms to optimize the server’s operating frequencies in order to reduce the overall energy consumption. An approach based on DVFS is presented in [23] for power optimization and end-to-end delay control in multi-tier web servers. Recent approaches, such as presented in [8, 12, 26, 28, 44, 47] also rely on DVFS techniques and include server on/off mechanisms for power optimization. However, these approaches are not designed (and not applicable) for virtualized server clusters. That is, they do not consider multiple application workloads in a shared cluster infrastructure. Predictive or proactive optimization policies have been shown to avoid unnecessary and disruptive configuration changes due to workload fluctuations, and thus may provide further energy reduction and better quality-of-service provided by the applications in a server cluster [29, 46]. Although our approach is not meant to address in detail the specific aspects on workload prediction, it allows for including predictive capabilities in our optimization control loop during monitoring activities. The idea of energy proportionality was presented by Luiz Barroso and Urs Hölzle in [6], which is that computing systems should consume power in proportion to their utilization level. The energy-proportionality concept would enable large energy savings in servers, considering their study at Google that servers in data centers are loaded between 10 and 50 percent of peak, with a CPU utilization that rarely surpasses 40 percent. Although power proportionality is very important, it does not reduce the importance of ensuring that data center resources should be near fully utilized, as pointed out in [20]. Thus, our solution to this problem is to efficiently manage the cluster utilization leveraging server virtualization and CPU DVFS techniques. Recent proposals like FAWN [3] explore the idea of building clusters of low-power embedded devices coupled with flash storage, which operate efficiently for specific I/O bound workloads. Recently, there is also an interest in distributing the workload across server clusters in different locations with respect to their energy consumption [30, 39]. An issue here is that energy cost models for distinct time zones and variable electricity prices need to be specified accordingly. There is also a trend of improving energy efficiency in multi-core server architectures by focusing on fine-grain power management [41] and dynamic voltage and frequency scaling (and on/off mechanisms) at core level [11]. However, optimization for power and performance in multi-core architecture context requires further research study in the impact of power-aware optimization decisions, such as considering shared cache memory issues.

1.3

Document organization

This thesis proposal is organized as follows. The mathematical formulation and dynamic strategy for power and performance management, together with the system model and architecture proposed, are presented in Chapter 2. We evaluate the optimization approach in Chapter 3 through simulations driven by actual workload traces and experiments to investigate the impact in server performance during dynamic management of virtual machines. We conclude the proposal and outlines future plans in Chapter 4. 4

Chapter 2 Proposed approach The cluster configuration problem that we consider in this proposal is to determine the most power-efficient cluster configuration that can handle a given set of application workloads [35, 36, 37]. A cluster configuration is given by (1) which servers must be active and their respective CPU frequencies and (2) a corresponding mapping of the applications (running on top of VMs) to physical servers. The major goal of our approach is to reduce power consumption in the cluster while meeting performance requirements. The underlying mathematical formulation for minimizing the power consumed in the virtualized cluster problem is given by a mixed integer programming (MIP) model. The optimization problem is solved periodically and the solution is used to configure the cluster.

Figure 2.1: Optimization approach At this point, we outline and generally describe the elements of the optimization approach. For the optimization approach to work, as shown in Figure 2.1, a Monitor module is employed to collect run-time properties of the server cluster system, such as service arrival rate or resource utilization. Next, the monitored values are evaluated by the Predictor module to filter and estimate future values from past observed values for the workload time series. Here several prediction techniques can be adopted, such as exponential smoothing [46] or autoregressive moving average (ARMA) linear models [15], typically applied to autocorrelated time series data. The MIP model is then updated using the observed and predicted values for the workload of the applications and a new instance of the optimization problem is constructed. The Optimizer module implements an optimization algorithm and solves the new optimization problem instance, yielding a new optimal configuration. The Control module is responsible 5

for applying the changes in the server cluster, transitioning it to a new state given by the new optimized configuration. In a virtualized environment, the applications are implemented and executed on top of virtual machines (VM) which are assigned to physical servers. In our model, multiple different VMs can be mapped (consolidated) to a single server. We assume that applications in the cluster can be mapped to VMs and these to servers in two different ways, as shown in Figure 2.2. Server 1

Server 1

Server 2

Server 2

VM1 app3 VM app2

VM app2 VM app1

VM app1

VM app3

(a)

VM2 app3

(b)

Figure 2.2: Application workload mapping: (a) every application runs in only one VM instance on a given server, and (b) one application may run in more than one VM instance, whereas these VMs are balanced among multiple servers We describe in Section 2.1 an optimization model that assumes that an application runs on top of only one VM running on one physical server at a time. Alternatively, we present in Section 2.1.1 an extension to the optimization model that allows an application to be implemented using multiple VMs, which are mapped and distributed to different servers. In this way, a particular application workload might be divided and balanced among different servers. Note that this assumption may not hold for every application workloads, such as those including session state management (user session data). In addition, we propose in our optimization model a possible switching and migration penalty to avoid frequent and undesirable turning servers on/off and disruptive VM migration reconfigurations, described in Section 2.1.2. In Section 2.2 we outline an optimization control strategy to periodically select and enforce the lowest power consumption configuration that maintains the cluster within the desired performance level, given the time-varying incoming workload of multiple applications. We describe in the Section 2.3 our target architecture which includes our optimization strategy designed to monitor and configure a cluster of virtualized servers.

2.1

Optimization model

Before describing the optimization model, we introduce the following notation. Let N be the set of physical servers in the cluster and Fi the set of frequencies for each server i ∈ N . Let M be the set of applications intended to run on the virtualized cluster. The parameter capij represents the maximum performance or capacity (e.g., requests/sec) of the server i running at CPU frequency j ∈ Fi . The parameters pbij and piij denote the busy and active-idle power cost, respectively, to 6

run server i at frequency j, where the αij variable denotes the utilization of server i running at frequency j. For each application k ∈ M , we define the parameter dk to represent the workload demand of that application. In practice, because workload variations are small in short time intervals, the application demand vector can be generated by monitoring the incoming workload (in terms of requests per second or CPU utilization) for each application in a front-end machine of the server cluster (see Section 2.3). The following decision variables are defined: xijk is a binary variable that denotes whether server i uses frequency j to run application k (xijk = 1), or not (xijk = 0); yij is a binary variable that denotes whether server i is active at frequency j (yij = 1), or not (yij = 0). The problem formulation is thus given by the following mixed integer program (MIP):

Minimize

XX

(pbij − piij ) · αij + piij · yij

(2.1)

i∈N j∈Fi

Subject to

X

dk · xijk ≤ capij · αij

∀i ∈ N, ∀j ∈ Fi

(2.2)

k∈M

XX

xijk = 1

∀k ∈ M

(2.3)

∀i ∈ N

(2.4)

i∈N j∈Fi

X

yij ≤ 1

j∈Fi

αij ≤ yij

∀i ∈ N, ∀j ∈ Fi

xijk ∈ {0, 1}, yij ∈ {0, 1}, αij ∈ [0, 1]

(2.5) (2.6)

The objective function given by Equation (2.1) is to find a cluster configuration that minimizes the overall server cluster cost in terms of power consumption. The power consumption for a given server i is a linear combination of the busy power pbij and the idle (but active) power piij for the selected CPU frequency j. As observed experimentally (cf. Chapter 3), the power comsumed by a server grows (almost) linearly between the minimum idle power and the power at full utilization for a given CPU frequency. The constraints (2.2) prevent a possible solution in which the demand of all applications k ∈ M running on a server i at frequency j exceed the capacity of that server. The constraints (2.3) guarantee that a server i at frequency j is assigned to a given application k. The constraints (2.4) are defined so that only one frequency j on a given server i can be chosen. The constraints (2.5) are used to bind the decision variable yij with the αij variable in the objective function. The solution is thus given by the decision variable xijk , where i is a server reference, j is the server’s status (i.e., its operating frequency or inactive status, when j = 0), and k represents the respective allocated application. For example, a solution that returns x123 = 1 means that application 3 is hosted on server 1 running on the CPU frequency 2. The virtualized cluster configuration problem is a variant of the one dimensional variable sized bin packing problem [21], which is defined as follows. Suppose we are given a set B of different bin types (servers), where each bin type i ∈ B has capacity Wi and a fixed cost Ci . The 7

problem involves packing a set J of items (applications), where each item j ∈ J has a weight (demand) wj , into a minimum-cost set of bins. In addition, the total weight of the allocated items into a bin cannot exceed its capacity. As a generalization of the classic bin-packing problem, this problem is known to be NP-hard [21]. A difference between our problem and the original variable-sized bin packing is that in our case we have, for each bin type, the possibility to choose among different options (CPU speeds) in a given bin (server). However, as shown previously in constraints (2.3) the MIP model, only one CPU speed on a given server can be chosen.

2.1.1

Application workload balancing

The previously described model cannot be applied to application workloads that demand more capacity than the supported by a single server. To address this issue, we propose an extension to the model where splitting application workloads is allowed but comes at a cost, as each part of a workload split incurs additional load balancing overhead. Since fragmentation incurs overhead, we currently attempt to avoid it as much as possible. In practice, the proposed extension means that each application may be associated with multiple virtual machines running on different physical servers. To avoid unnecessary workload application fragmentation, which may not come without an associated overhead, we include in the model a new parameter input F RAG to denote the cost of fragmentation. This denotes a constant additive penalty factor that adds a cost in the objective function to each extra server to allocate a given application that is already allocated in another physical server. We may define the fragmentation cost in terms of the average power cost of the server machines in the cluster. That is, an average power cost needed to turn a new server on. From the original optimization model, we modify the decision variable xijk ∈ [0, 1] to represent the workload allocation part of an application k in a given server i that runs at frequency j. For example, a solution that returns x123 = 0.19 means that 19% of the workload of application 3 is allocated on server 1 running on the CPU frequency 2. To facilitate modeling the fragmentation penalty in the objective function, we define a new decision variable zik ∈ {0, 1} to denote if an application k is allocated on server i. We include in the model a new set of constraints, which are defined as follows: xijk ≤ zik ∀i ∈ N, ∀j ∈ Fi , ∀k ∈ M . In addition, we modify the objective function of the original optimization model, Equation (2.1), as follows:

Minimize

XX

(pbij − piij ) · αij + piij · yij + F RAG ∗ (

i∈N j∈Fi

X X ( zik ) − 1)

(2.7)

k∈M i∈N

The modified objective function given by Equation (2.7) has a new term to account for the fragmentation cost. Specifically, this term adds a cost in case of, for each application k, the sum of zik for all selected server i ∈ N is more than one, meaning that a given application is allocated in more than one server. From another point of view, the ability to fragment items with low (or nullified) cost would allow the increase of copies of certain applications, which would favor fault tolerance. That is, if a physical server fails, other copies (fragments) for certain applications on other physical servers 8

would continue in operation. We intend to investigate this concern in future work as it might be useful to handle server failures depending on the application requirements.

2.1.2

Modeling switching and migration costs

In a real server cluster environment, dynamic configurations in the cluster may be subjected to switching costs, for example, in terms of the power consumed while a machine is being turned on/off. To handle this issue, we incorporate a penalty value in the optimization model to represent the overhead of turning machines on/off. Specifically, we modify the objective function of the original optimization model, Equation (2.1), as follows: Minimize

XX

(pbij − piij ) · αij + piij · yij

i∈N j∈Fi

+ swt cost(Uij , yij ) + mig cost(Aik , zik )

(2.8)

The modified objective function given by Equation (2.8) has new terms to account for switching and migration costs. To calculate a transition cost, we have included in the model a new parameter input Uij to denote the current cluster usage in terms of which machines are turned on and off; that is, Uij = 1 if machine i is running at speed j. Similarly, the new parameter input Aik denotes which application is currently associated with which server. More precisely, we may define the switching cost function as follows: swt cost(Uij , yij ) = SW T P · (yij · (1 − Uij ) + Uij · (1 − yij )). The constant SW T P represents a penalty for turning a machine off (if it was on) and for turning a machine on (if it was off), which can mean an additional power consumed to boot a server machine. Currenly, we do not consider the penalty of changing frequencies. Actually, for a given server i ∈ N , if Uij = 1 for at least one j ∈ Fi , we set Uij = 1 for all j ∈ Fi to avoid taking into account frequency switching costs. The migration cost function may be defined similarly based on the previous allocation input variable Aik and the new allocation decision zik , such as mig cost(Aik , zik ) = M IG P ·(zij ·(1−Aij )+Aij ·(1−zij )). We assume that both server switching on/off and migration penalties can be estimated in a real server cluster. For example, the cost of VM migration could be measured a priori and stored in a table using the approach proposed in [52].

2.2

Optimization control strategy

Dynamic optimization behavior is attained by periodically monitoring the proper inputs of the optimization model and solving a new optimal configuration given the updated values of the inputs. In other words, as some of the input variables may change over time, such as the application workload vector, a new instance of the optimization problem is constructed and solved at run-time. We assume that the workload does not change radically often and it is mostly stable during the specified control period. Particularly, we are able to devise a control loop of the following form: (a) monitor and store the most recent values of the optimization input variables, (b) construct and solve a new 9

optimization problem instance, yielding a new optimal configuration and (c) apply the changes in the system, transitioning the system to a new state given by the new optimized configuration. The details are given in the following algorithm. do periodically: // 1. Input variables curDemand = getDemandVector() curUsage = getCurrentUsage() curAlloc = getCurrentAlloc() // 2. Workload filter / prediction d = predict(curDemand) // 3. Run optimization newUsage, newAlloc = bestConfig(d) // 4. Generate usage and alloc sets for changes chgUsage = sort(diff(newUsage, curUsage)) chgAlloc = sort(diff(newAlloc, curAlloc)) // 5. Power management operations for (i, j) in chgUsage: if j == 0: turnOff(i) else: if curUsage[i] == 0: turnOn(i) setFreq(i, j) // 6. Virtualization management operations for (k, i) in chgAlloc: if i == 0: stopVm(k, curAlloc[k]) else: if curAlloc[k] == 0: startVm(k, i) else: migrateVm(k, curAlloc[k], i)

The control loop outlined above relies on the mathematical formulation described in Section 2.1 to solve the cluster configuration problem. The key idea of the optimization control policy is to periodically select and enforce the lowest power consumption configuration that maintains the cluster within the desired performance level, given the time-varying incoming workload of multiple applications. The input variables for the control loop algorithm, described in step 1, are: the monitored and updated application demand (load) vector, the current server configuration and application 10

allocation. At step 2, we may apply a predictive filter to estimate the workload demand vector for lookahead horizons, such as proposed in [15]. The bestConfig operator, at algorithm step 3, returns a cluster usage and allocation solution, where newUsage represents an usage configuration of servers and their respective status (i.e., its operating frequency or inactive), and newAlloc represents which application has to be associated with each server. In fact, the configuration to be imposed is a difference between two sets: the new configuration and the current configuration solution. For example, suppose the current cluster usage is curUsage={(1,0),(2,2),(3,0)} and the new usage is newUsage={(1,0),(2,0), (3,4)}. Thus, we need to perform a change in the system given by chgUsage = newUsage − curUsage = {(2,0),(3,4)}. That is, we need to turn off server 2, and turn on server 3 at the frequency 4. To handle this, we apply a diff operator in the usage and allocation solutions provided by the optimization operator (see the algorithm step 4). The order in which the operations are executed may lead to a problematic behavior. Specifically, in the example above, if the new usage configuration shutdowns the current active server before the new server is ready to respond the requests, as the server booting time is not instantaneous, the cluster will be in an unavailable state. To solve this issue, we simply sort the new cluster usage representation so that the operation to shutdown servers is always performed at last, and the operations to increase frequency and turn on servers, respectively, are performed at first. This scheme can be similarly adopted to the new allocation representation, in which the operations to start and migrate virtual machines are performed at first. To achieve this, we make use of a sort operator in the configuration algorithm, as shown in algorithm step 4. At step 5, we employ dynamic configurations for power optimization which consists of (a) turning servers off in periods of low activity and turning them back on when the demand increases, and (b) exploiting the dynamic voltage and frequency scaling capability of current processor architectures to further reduce power consumption. Finally, to manage the application services (which are associated to virtual machines), we rely on configuration operators to start, stop, and migrate the virtual machines in the server cluster, such as those described in the algorithm step 6.

2.2.1

Execution example

As an example of optimization execution, we may assume that the demand vector (in request/sec) for three different applications is d=[45,120,17]. After solving the optimization problem, we have an abstract configuration solution given by a vector of tuples (i, j, k), where i is a server, j is the respective CPU speed, and k is the allocated application, defined by conf = [(1,3,1),(1,3,2),(1,3,3)]. This means that, application 1, 2 and 3 are hosted by server 1 at frequency 3, which is its maximum frequency, while the other servers are turned off. At another execution snapshot, say d=[45,170,17], the new configuration solution is defined by conf = [(2,1,1),(1,3,2),(1,3,3)]. This means that if the demand for application 2 has been increased from 120 to 170, we need to turn on a new server and migrate application 1 to this new server 2 that will run at frequency 1 (i.e., the minimum frequency). In case the demand for application 2 decreases, we may turn off server 2 to save power and migrate back application 1 to server 1. This particular example does not consider the overhead of switching servers on/off. 11

2.3

Target cluster architecture

In the server cluster, we need to maintain two correlated objectives. First, we aim to guarantee some quality-of-service requirements for the cluster (e.g., by controlling average cluster load or request response time). Second, to reduce costs, the set of currently active servers and their respective processor’s speeds should be configured to minimize the power consumption. The architecture (shown in Figure 2.3) consists of a cluster of replicated web servers. The server cluster presents a single view to the clients through a special entity termed dispatcher server, which distributes incoming requests among the actual servers (also known as workers) that process the requests. The dispatcher server is also responsible for monitoring and maintaining the current status of the workers. The optimization strategy includes the code for monitoring run-time properties and for performing dynamic configurations on the server cluster, which can be implemented and deployed in the dispatcher server or in a dedicated server that connects to the dispatcher server.

Figure 2.3: Server cluster architecture

2.4

Practical issues

The design and implementation of the overall optimization approach presents some practical issues to consider in this thesis. For example, the optimization model needs information about power and performance characteristics and application workload demands to solve the cluster configuration problem. To build a comprehensive model to characterize this information including different applications and many heterogeneous machines with different capacity and powerconsumption measurements considering large server clusters is not a simple task and becomes an engineering issue. Also, as the optimization problem is solved at run-time and periodically, 12

the optimization algorithm must have a processing time requirement that is constrained by the cluster configuration control period. There may be some practical concerns in applying the control loop algorithm (described in Section 2.2) in a real cluster environment. For example, a sequential or concurrent way of executing the configuration operations may be investigated. This would help to avoid inconsistency among the execution of multiple operations, for example, guaranteeing that the servers will be turned on before the migration operations are requested. Additionally, the time delay and costs associated with performing a dynamic configuration must be taken into account, such as allocating a new virtual machine and turning a server on/off. That way, potential disruptions in the server application behavior when performing dynamic changes must be carefully evaluated. To help minimize those disruptive impacts, server switching and migration costs modeling activities (cf. Section 2.1.2) and workload prediction techniques (cf. [38]) to improve the quality of the optimization decisions have to be investigated. In summary, this thesis must address relevant engineering issues to fully realize the optimization approach. Next, we describe the preliminary work that gives evidence to the feasibility of the approach.

13

Chapter 3 Preliminary work We have carried out a set of simulation experiments using description of the cluster environment described in Section 2.3 to evaluate the optimization proposal. This highlights the important preliminary work that demonstrates the feasibility of the overall approach. The optimization problem formulation was implemented using the solver CPLEX 11 [24], which employs very efficient algorithms based on the branch-and-cut exact method [40] to search for configuration solutions. Although our optimization problem is NP-hard, we are able to obtain optimal or nearoptimal solutions using the CPLEX MIP solver.

3.1

Simulation environment

The simulations were performed in an Intel Core 2 Quad 2.83 GHz with 8GB of RAM memory running Ubuntu Linux (kernel version 2.6.27). In the simulation results, we adopted a control loop of one second interval, in which the optimization worst-case execution time was about 70 ms (as shown in Section 3.4). We consider a study case of 3 applications and 5 physical servers. The system load is measured by the number of incoming requests per second in the cluster. Because we are interested in the macro behavior of the dynamic optimization, our cluster model is simplified in that we assume that there is no state information to be maintained for multiple requests. Specifically, we measured the capacity of the servers, for each frequency, in terms of the maximum number of requests per second (req/s) that they can handle at 100% CPU utilization. To generate the benchmark workload, we used the httperf tool, where each HTTP request is a PHP script with a fixed average execution time. Using the LabVIEW software environment, coupled with a USB based data acquisition device, we measured different idle and busy power values for each frequency to build a power model. The power consumption of a server, for each available discrete frequency, varies linearly with its CPU utilization. This power estimation works well because servers have linear power response between idle (0% of utilization) and busy (100% of utilization) for a given CPU frequency (see Figure 3.1). That is, we are able to measure only idle and busy states to achieve very good estimates of power used at any performance level for a given workload. Figure 3.2 shows an example of power and performance model for the machines measured in our cluster. Depending on the system requirements, a relationship between cluster load and response time 14

Figure 3.1: Power measurements in different performance levels for a fixed CPU frequency might be established, for example, assuming queue models or control theory [8, 54]. Leveraging our optimization strategy, we can tailor our controller, for example, to monitor the request response time and to adapt the cluster capacity accordingly. This intends to provide soft real-time guarantees in server clusters, in which the requests processing time have a specified deadline. We intend to incorporate the response time control in our approach as future work. We generated three distinct workload traces using the 1998 World Cup Web logs to characterize the multiple applications in the cluster. The applications within the cluster can have a wide range of request demands, as shown in Figure 3.3. The workloads are spaced by 1 second, approximately 30 minutes in duration. We adapted the original workload data to fit our cluster capacity, which is measured in requests per second. The maximum capacity of our cluster setup is 745 req/s, when all servers are turned on at full CPU speed. The combined workload curves of the three applications reach a peak of 734 req/s at time around 470s (see Figure 3.3), which represents 98% of the maximum capacity of our cluster.

3.2

Dynamic optimization execution

In the simulation, we assume that App1, App2, and App3 services can be distributed and balanced among all servers in the cluster. Thus, we adopt the modified optimization formulation described in Section 2.1.1. For instance, the App1 service at the workload time around 700s demands a performance of more than 500 req/s. The maximum performance achieved by using a single machine in the cluster is 235 req/s, which corresponds to the 5th server running at maximum CPU frequency. The simulation execution results are given in Figure 3.4. These results were plotted in 30 second intervals to help in visualizing the switching activities. The upper plot shows the configuration changes (frequency switching) for all servers in the cluster. If the operating frequency for a server is zero, it is turned off. In the bottom plot, we can observe the optimized allocation of the applications in the cluster along the simulation. For example, from time 0s to 57s, the 15

Server 4: joule CPU: AMD Athlon(tm) 64 3500+

Server 1: ampere CPU: AMD Athlon(tm) 64 X2 3800+ Freq. (MHz)

Pbusy (W)

Pidle (W)

Perf. (Req/s)

Freq. (MHz)

Pbusy (W)

Pidle (W)

Perf. (Req/s)

1000 1800 2000

81.5 101.8 109.8

66.3 70.5 72.7

94.7 168.4 187.6

1000 1800 2000 2200

74.7 95.7 103.1 110.6

66.6 73.8 76.9 80.0

47.1 84.0 93.5 102.0

Server 2: coulomb CPU: AMD Athlon(tm) 64 3800+ Freq. (MHz)

Pbusy (W)

Pidle (W)

Perf. (Req/s)

1000 1800 2000 2200 2400

75.2 89.0 94.5 100.9 107.7

67.4 70.9 72.4 73.8 75.2

47.5 84.6 94.0 102.8 111.5

Server 5: ohm CPU: AMD Athlon(tm) 64 X2 5000+ Freq. (MHz)

Pbusy (W)

Pidle (W)

Perf. (Req/s)

1000 1800 2000 2200 2400 2600

82.5 99.2 107.3 116.6 127.2 140.1

65.8 68.5 70.6 72.3 74.3 76.9

92.9 165.9 184.4 201.0 218.1 235.3

Server 3: hertz CPU: AMD Athlon(tm) 64 3800+ Freq. (MHz)

Pbusy (W)

Pidle (W)

Perf. (Req/s)

1000 1800 2000 2200 2400

71.6 85.5 90.7 96.5 103.2

63.9 67.2 68.7 69.9 71.6

47.0 83.9 93.0 102.3 110.5

Figure 3.2: Power and performance model for the servers in the cluster allocation set {1, 2, 3} is associated with the 5th server, meaning that all the three applications are consolidated on that server while the other servers can be turned off. At time around 470s, all servers are turned on to handle the highest peak for the combined workloads.

3.3

Energy savings

We mainly evaluated the effectiveness of our approach in terms of the percentage of energy consumption reduction in the cluster as compared to the Linux on-demand and performance CPU governors. The performance governor keeps all servers turned on at full speed to handle peak load and dynamic optimization is not conducted. The ondemand governor allows for managing the CPU frequency depending on system utilization, but does not include server on/off mechanisms. We implemented by means of simulation the ondemand policy based on the algorithm described in [50]. The basic idea is as follows: If current utilization is more than an up threshold (80%), the policy increases the frequency to the maximum. Whenever a low utilization (less than 20%) is observed, the policy jumps directly to the lowest frequency that can keep the system utlization at 80%. The allocation sets for the performance and ondemand governors are statically configured in this way: Server 1 hosts {App2}; Server 2 hosts {App3}; Server 3 hosts {App1}, Server 4 hosts {App1}; Server 5 hosts {App1, App2, App3}. The respective allocation shares are defined as follows: App1 uses 100% of Server 3, 100% of Server 4 and and 13.1% of Server 5; App2 uses 100% of Server 1 and 14.6% of Server 5; App3 uses 100% of Server 2 and 70.3% of Server 5. These values were obtained from the optimized configuration solution at the highest peak for the combined workloads. We used a simple round-robin method for application 16

Load (req/s)

600 500 400 300 200 100 0

Load (req/s)

600 500 400 300 200 100 0

Load (req/s)

600 500 400 300 200 100 0

App1

0

200

400

600

800

1000

1200

1400

1600

1800

App2

0

200

400

600

800

1000

1200

1400

1600

1800

App3

0

200

400

600

800 1000 Time (s)

1200

1400

1600

1800

Figure 3.3: Workload traces for three different applications using HTTP logs from WC98 workload balancing. The energy consumption was calculated by means of an approximation given by the sum of utilization of the active servers multiplied by the related busy and active-idle power consumption, accumulated along the simulation duration time. That is, assuming the notation introduced in Section 2.1, the cluster energy consumption can be expressed as P P t t ) · piij such as i is an active server and j is its operating · pbij + (1 − αij E = t i,j αij frequency with the respective utilization at time t, imposed by the associated workloads, and t ∈ {1, · · · , T } is expressed in seconds, where T is the duration of the simulation. Thus, E is measured in Joule (or watt x second). By using our approach, the energy consumption in the cluster is substantially reduced. In the performance simulation execution, a total energy consumed was 847,778.82J = 235.49Wh, whereas in the ondemand execution, the energy consumed was 735,630.05J = 204.34Wh. In the execution using our approach the energy usage was 452,050.15J = 125.57Wh. This means an energy consumption reduction of about 47% compared to performance policy and 38% compared to ondemand. The main argument for this greater energy savings is the fact that activeidle power consumption of current server machines (as shown in Figure 3.2) is substantially high. This in turn makes server on/off mechanisms (used by our optimization) very power-efficient.

3.4

Scalability considerations

To evaluate the scalability of our approach, we generated different pairs of server-application setup. For each pair, we ran the CPLEX to build and solve 180 instances. This means that each instance uses as its application demand vector the workload data at each time interval of 10 seconds using the traces shown in Figure 3.3. 17

8 Server 1 Server 2 Server 3 Server 4 Server 5

7

CPU Frequency

6 5 4 3 2 1 0

Allocation sets

0

200

400

600

800

1000

1200

1400

1600

1800

Server 1 Server 2 Server 3 Server 4 Server 5 \1,2,3\ \2,3\ \1,3\ \1,2\ \3\ \2\ \1\ 0

200

400

600

800 1000 Time (s)

1200

1400

1600

1800

Figure 3.4: Dynamic optimization execution The CPLEX solver was executed for every instance with a user-defined solution time limit of 180 seconds, which is related to the maximum allowed control period used in our dynamic optimization policy for managing the cluster environment. Table 3.1 shows the results of simulations with different number of servers and applications. From 5 to 30 servers, the optimal configuration solutions were found in all 180 runs within the solution time limit. From 50 to 100 servers, there is at least one instance where CPLEX could not find the optimal solution for all instances within 180 seconds. Server-App (5,3) (10,6) (15,9) (30,18) (50,30) (80,48) (100,60)

Avg. (s) 0.022 0.054 0.062 0.392 13.630 58.941 80.135

Stdev. (s) 0.018 0.035 0.038 0.913 29.959 53.570 52.394

Max. (s) 0.070 0.250 0.240 8.610 180.010 180.020 180.030

Table 3.1: Scalability simulation In order to speed up the process of obtaining a high quality solution, we adopted a simple heuristic by setting a gap tolerance of 5% with respect to the optimal solution. This is a userdefined value and intends to allow the solver to provide acceptable solutions in a short amount of time. Table 3.2 summarizes the simulation results when the minimum gap tolerance criteria was adopted. From 5 to 350 servers, the configuration solutions were found in all 180 runs within the gap tolerance (5%) and the solution time limit (180 seconds), with a maximum processing time 18

Performance Ondemand Dynamic optimization

600

Power (Watts)

500

400

300

200

100

0 0

200

400

600

800 1000 Time (s)

1200

1400

1600

1800

Figure 3.5: Cluster power consumption about 75 seconds. For 500 servers, there were three instances where CPLEX could not find the solution within the time limit. Server-App (5,3) (10,6) (15,9) (30,18) (50,30) (80,48) (100,60) (200,120) (350,210) (500,300)

Avg. (s) 0.006 0.023 0.031 0.062 0.139 0.267 0.481 2.893 16.488 48.409

Stdev. (s) 0.008 0.022 0.030 0.067 0.281 0.235 0.409 1.993 12.979 41.472

Max. (s) 0.040 0.100 0.130 0.540 2.390 3.000 3.080 11.550 75.440 181.030

Table 3.2: Scalability simulation using the optimality gap criteria This strategy considers the solution gap between the best integer feasible solution found so far and the lower bound (LB) provided by the solver, which is usually calculated by solving a pure linear version the original problem. In minimization problems, the LB can be seen as a reference value which ensures that the optimal solution is greater or equal than this quantity. Considering the small gap value used, the CPLEX was capable of finding highly acceptable solutions, i.e., close to the optimal lower bound. Even though we generate a number of scenarios involving different pairs of server-application setup, it is not possible to assume that the CPLEX will have a similar behavior in all instances. The main difficulty is that the branch-and-cut method has a worst-case exponential time complexity and depending on the combination of application workloads, this approach may lead to poor solutions in an acceptable runtime execution (solution time limit). Nevertheless, based on the simulations presented here, we have observed that the CPLEX performs well on the average 19

case. Given a typical optimization control period of few minutes, such as used in [29], the proposed optimization approach is suitable and scales well for clusters with up to 350 machines. This seems to be a reasonable size for a server cluster set, because, for instance, servers can be divided in smaller clusters or racks in a hierarchical fashion to address scalability issues. We observe that CPLEX is a good candidate to solve these problems in practice. In addition, CPLEX proves that its answers are optimal or within a given percent of optimal, unlike any other pure meta-heuristics or local search techniques.

3.5

Switching cost effects

We describe two execution scenarios to analyze the switching cost effects in our optimization approach. To account for the switching cost, we define the penalty SW T P = 50W in terms of the power consumed for turning machines on/off, as proposed in Section 2.1.2. That is, we assume that a system switching on consume 50W more than when active-idle. We have specified a sufficiently high penalty to avoid switching overhead, but more accurate values for these penalties have to be investigated in further study. Table 3.3 shows a comparison between the two scenarios. Scenario A does not account for switching costs, whereas scenario B does. Time Demand Config. A Config. B 1 [32, 100, 15][(5, 2, 1), (5, 2, 2), (5, 2, 3)][(5, 2, 1), (5, 2, 2), (5, 2, 3)] 2 [32, 120, 15][(1, 2, 1), (1, 2, 2), (1, 2, 3)][(5, 3, 1), (5, 3, 2), (5, 3, 3)] 3 [32, 140, 15][(5, 4, 1), (5, 4, 2), (5, 4, 3)][(5, 4, 1), (5, 4, 2), (5, 4, 3)]

Table 3.3: Switching cost scenarios Recall that a configuration solution is defined as a vector of tuples (i, j, k), where i is a server at frequency j and k is an application allocated to that server. At time 1, the configuration solutions are identical for the two scenarios, which means that all applications are hosted by server 5 at frequency 2. However, at time 2, the new configuration for the scenario A requires that server 1 be turned on (at frequency 2) and server 5 be turned off, which involve overhead costs. On the other hand, including switching costs as in scenario B requires that server 5 (which is already turned on) increase its frequency to 3, which adds essentially no disruption to the system. At time 3, both the scenarios require that the server 5 increase its frequency to 4, transitioning to the same configuration. Note that similar disruptive on/off actions were employed by scenario A, whereas scenario B has only relied on manipulation of frequencies. To simplify the calculation, we assumed that a server can be turned on in one time step. We also evaluated the effectiveness of the switching cost modeling in terms of the number of switching activities reduction as compared to the optimization model that does not account for switching costs (used as baseline). Our simulation execution (using the workload description from Section 3.1) indicated that the adoption of a switching cost model required 90 activities of turnning servers on/off and 156 activities when employing the baseline model. This means a switching reduction of 42%. Moreover, in a real scenario, application services in the cluster may retain important persistent states, such as web sessions, which leads to additional cost in 20

switching from one server to another. Also, some overhead related to hardware reliability can be accounted for. We believe these issues provide useful directions of future investigation.

3.6

Evaluation on VM allocation and migration

We have experimented our optimization approach through simulations driven by real-world workload traces. However, in practice, a problem that arises in this context is that migration and replication activities in a virtualized environment may lead to disruption on the quality of service provided by the applications. For example, live migration mechanisms often allow workload movement with a short service downtime. However, the quality-of-service of the running applications are likely to be negatively affected during the migration activities [53]. The work described in [32] intends to help us to fully implement our overall optimization approach for power and performance management in real virtualized clusters. Specifically, we carry out a set of experiments with different test scenarios to evaluate the application behavior during the course of cold and live migrations and replication actions using virtual machines. Although the occurrence of disruptions when performing dynamic changes in the cluster environment are sudden and sometimes unavoidable, we investigate an alternative scheme using replication to help minimize these disruptive impacts in the QoS of the applications. We measure and analyze the disruptive impact on the QoS (quality-of-service) provided by the applications, by means of server-side response time and throughput, during migration and dynamic allocation of virtual machines in a server cluster [32]. During live migration the caches in hardware are not migrated [52], which can lead to cache misses in the target machine and impact application’s performance when performing migrations. As pointed out by [53], both migration activities need extra CPU cycles for the pre-copying process which are consumed on both source and destination servers. Moreover, an additional amount of network bandwidth is consumed as well, which may affect the quality-of-service in the cluster. A third available option is VM replication, which means creating a new VM (application instance) from a stored image, instead of migrating an already running one (see [32] for details). It is worth mentioning that our optimization approach includes the possibility of running replicated virtual servers for fulfilling the resource demands required by an application at a given operational stage. Finally, in the replication process, a VM in the source server may be simply turned off to save energy. When applying the optimization solution, we would like to investigate if it would be valuable while maintaining the VM application on the source server as well as for load balancing proposes. And, doing so, we would like to determine what part of the application workload would be allocated both in the source and target physical machines.

21

Chapter 4 Thesis status and future work In this proposal, we presented an approach for power optimization in virtualized server clusters, including an optimization MIP model and dynamic configuration strategy. In the optimization model, we addressed application workload balancing and the often ignored switching penalty aimed to avoid frequent and undesirable turning servers on/off. Our simulations show that our strategy can achieve power savings of 47% compared to an uncontrolled system (used as baseline) [36]. By using a simple but effective optimality gap criteria in the optimization solver, our approach scales well for clusters with up to 350 servers. However, in very large data centers, with thousands of machines, smaller sets (hundreds) of machines of can be allocated (dynamically) to support dedicated clusters for specific applications, which could be managed autonomously (as pointed out by [13]). Thus, we believe that our optimization approach can be successfully applied in this context. We also have carried out some experiments to evaluate the performance impact in terms of response time and throughput of applications during the course of VM migration and replication. The replication steps involved starting a VM replica in the target host and redirecting requests to the new VM replica. Our preliminary results showed that by using replication we can minimize some performance disruption incurred during migration [32].

4.1

Limitations and potential improvements

The preliminary evaluation of this work allowed us to identify limitations in the current work and to establish the following technical challenges to be addressed in the further development of this thesis research. Power and performance modeling Information about power and performance for the physical servers in the cluster is considered an important input to solve our optimization models. Imagine a scenario where heterogeneous servers (with many different CPU frequencies and performance levels) that can handle different types of requests (with distinct processing demands). The task of building a comprehensive model to characterize these power and performance characteristics is not simple since large clusters include many heterogeneous machines and even each machine could have different capacity and power-consumption 22

measurements according to its number of cores, CPU operating frequencies, specific hardware, etc. In the same way, the workload for the applications can also have different processing demands depending on the type of the incoming requests and can significantly change over time. The complete benchmark task commonly would require a high cost for initial setup to implement our approach in a given operational server cluster. We would like to investigate capacity planning methods to accurately determine server loads, application workload demands and response time guarantees in a simple and effective way. Specifically, we plan to devise a simple but effective way of modeling power consumption and capacity of servers even under very heterogeneous and changing workloads based on regressive linear models by analyzing power-performance relationships. Prediction techniques to improve optimization decisions We consider that the optimization decisions for dynamic configuration can be improved by leveraging predictions about future resource availability and utilization in the server cluster, such as those approaches described in [15, 25, 29, 38]. Leveraging predictive capabilities the optimization approach can cope with well-known patterns in measurements readings, such as trends, that indicate anticipatory conditions for triggering new optimized configurations. For instance, the Holt forecasting procedure, a variant of exponential smoothing, could be used [1, 45], which is a simple method that copes with trends in measurements and can be suitable for producing short-term forecasts for demand time-series data, such as the incoming workload. Examples of enhancements given by prediction capabilities might be (1) turning on a new server in advance before a resource saturation occurs, and (2) minimizing undesirable disruptive decisions and oscillatory behavior; for example, turning on/off servers based on short-term changes in resource availability unnecessarily can lead to waste of processing time (and certainly energy). Specifically, we plan to apply prediction techniques and statistical methods to the application workload vector to enable the optimization control strategy to make anticipatory optimization decisions. Experiments using prediction-based optimization decisions compared to not using predictions (and prediction error analysis) would also be investigated. Speed-up of the optimization execution time Currently, our optimization models are solved using a black-box exact MIP solver (IBM ILOG CPLEX [24]), without exploring any knowledge about the optimization problem. One drawback of solving MIP models to optimality is the high computation time for large instances of the problem. A promising approach to minimize such scalability issues in the optimization is to employ integer programming column generation technique, which may give significant performance speed ups in finding optimal solutions for variable sized bin-packing problems (see [2]). To achieve that objective, the column generation can be implemented within the branch-andbound approach using state-of-art MIP solvers, such as CPLEX. This would enable our optimization approach to solve large instances of the cluster configuration problem much faster. Since we are solving a sequence of problems, by modifying and re-solving the model in a control loop, another promising technique is to explore a feature provided by solvers like CPLEX called MIP start leveraging optimization solutions already solved in previous control steps. These user-provided initial solutions would give hints to the solver on how to solve a new instance of the optimization problem; for example, heuristics might 23

be employed to explore the solution neighborhood of the initial solutions. This facilitates finding new feasible solutions which would be used as upper bounds (primal limits) in the branch-and-bound process and would allow the new optimal optimization solution to be found in much less computational time. In addition, some valid inequalities can also be added to the optimization model to enhance the linear programming relaxation bound, which would speed up the overall optimization process [14]. Response time guarantees We consider that it is also important to guarantee certain quality-ofservice (QoS) requirements of a server application. Since the response time of the web applications in the cluster provides a good QoS metric for qualifying the end-user experience, we would like to incorporate the response time monitoring and control in our approach. Specifically, the response time metric refers to the time interval measured between the instant of that the request arrives at the cluster and the time that the associated response leaves the cluster. This way, we can model a server cluster as a soft real-time system, in which the requests have a specified deadline or target response time. To address this, we need to monitor the request response time and to adapt the cluster capacity accordingly, while leveraging our optimization strategy described in Chapter 2. Specifically, we plan to implement an extension to the Monitor module to collect run-time properties for the request response time aimed at providing QoS information to manage the server cluster capacity, such as proposed in [7, 54]. Another way to meet response time requirements is to relate CPU utilization with the request response time metric. For example, as observed in [32], when the CPU utilization of an application is low, the average response time is also low. This is expected since no time is spent queuing due to the presence of other requests. On the other hand, when the utilization is high, the response time goes up abruptly as the CPU utilization gets close to its maximum. Thus, the optimization approach would perform dynamic configurations before the machine saturates by monitoring only CPU utilization. Configuration support framework In a previous work, we have proposed and developed an adaptation framework [33], which works on top of an abstract configuration model of the system and provides general configuration mechanisms: (1) to specify and monitor runtime properties of an executing system, (2) evaluate the model for system’s requirements violation and (3) perform adaptations on the system configuration to maintain the system behavior within acceptable bounds. The dynamic configurations are performed under guidance of scripts written in a high-level configuration language, designed separately from the target system. This configuration framework may be adopted for providing general monitoring and adaptation mechanisms used to implement the optimization proposal in a cost-effective manner [34]. Specifically, the monitoring and dynamic configuration mechanisms (used by our optimization strategy in Section 2.2) can be implemented in terms of an application programming interface (API) given by the system support level. For example, the Apache web server [49] supports an API to enable developers to extend a server with their own extension modules. The Xen hypervisor [4] also provides capabilities and mechanisms to monitor and manage virtual machines in a server cluster by means of an API. The key idea of an API is that it specifies an abstract and well-defined interface to control the behavior of the system, which builds on lower-level mechanisms at the system level. A 24

desired feature for an API is that it can be called from several programming languages and is available as a remote procedure call, such as the XML-RPC protocol. The configuration framework would encapsulate most of the required functionality for dynamic configuration in terms of an API and provide generic configuration operators. This, in turn, enables the dynamic optimization control strategy to be described in a more appropriate way by using a number of high-level constructs. Examples of our API and operators include a call to a configuration operator, termed bestConfig, which encapsulates an optimization algorithm for solving the cluster configuration problem.

4.2

Research schedule

Here we outline the research schedule towards the completion of this thesis. In general terms, we aim to implement the optimization approach as a software infrastructure and validate it in a practical virtualized computing environment using a standard multi-tier web benchmark, such as RuBis that simulates an action web site like e-Bay [43]. We also attempt to overcome the limitations of the optimization approach such as those described previously in Section 4.1. The proposed schedule for the thesis research is given in Table 4.2, which will be updated as the work progresses. At present, the target completion date is December 2012. At the conclusion of this thesis, we expect to improve the understanding of power and performance management in server clusters, and the techniques of virtualization for improving power management optimizations in a real server cluster system. Phase Date Main task Implementation aug/2010 – feb/2011 Optimization control strategy Experiments mar/2011 – sep/2011 Web server benchmark (RUBiS) Results analysis oct/2011 – apr/2012 Power/performance analysis Completion may/2012 – dec/2012 Dissertation write-up / Oral defense Table 4.1: Research schedule

25

Bibliography [1] Introduction to time series analysis, Section 6.4, NIST/SEMATECH e-handbook of statistical methods. http://www.itl.nist.gov/div898/handbook/, 2008. URL http://www.itl. nist.gov/div898/handbook/. 4.1 [2] Cludio Alves and J.M. Valrio de Carvalho. Accelerating column generation for variable sized bin-packing problems. European Journal of Operational Research, 183(3):1333 – 1352, 2007. ISSN 0377-2217. doi: DOI:10.1016/j.ejor. 2005.07.033. URL http://www.sciencedirect.com/science/article/ B6VCT-4K5JBPV-C/2/588df28c411868714a8f91d2b160b3d3. 4.1 [3] David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. Fawn: a fast array of wimpy nodes. In SOSP ’09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 1–14, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-752-3. doi: http://doi.acm.org/10.1145/ 1629575.1629577. 1.2 [4] Paul Barham et al. Xen and the art of virtualization. In SOSP’03, pages 164–177. ACM, 2003. ISBN 1-58113-757-5. doi: http://doi.acm.org/10.1145/945445.945462. 1, 4.1 [5] Luiz André Barroso. The price of performance. ACM Queue, 3(7):48–53, 2005. ISSN 1542-7730. doi: http://doi.acm.org/10.1145/1095408.1095420. 1 [6] Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing. Computer, 40(12):33–37, 2007. ISSN 0018-9162. doi: http://dx.doi.org/10.1109/MC.2007.443. 1.2 [7] Luciano Bertini, Julius Leite, and Daniel Mossé. Dynamic configuration of web server clusters with QoS control. In WIP Session of the 20th Euromicro Conference on Real-Time Systems, 2008. 4.1 [8] Luciano Bertini et al. Statistical QoS guarantee and energy-efficiency in web server clusters. In ECRTS’07, pages 83–92, 2007. ISBN 0-7695-2914-3. doi: http://dx.doi.org/10. 1109/ECRTS.2007.31. 1.2, 3.1 [9] Ricardo Bianchini and Ram Rajamony. Power and energy management for server systems. Computer, 37(11):68–74, 2004. ISSN 0018-9162. doi: http://dx.doi.org/10.1109/MC.2004. 217. 1, 1.2 [10] Martin Bichler, Thomas Setzer, and Benjamin Speitkamp. Capacity Planning for Virtualized Servers. Workshop on Information Technologies and Systems (WITS), Milwaukee, Wisconsin, USA, 2006. 1.2 26

[11] W. Lloyd Bircher and Lizy K. John. Analysis of dynamic power management on multicore processors. In ICS ’08: Proceedings of the 22nd annual international conference on Supercomputing, pages 327–338, New York, NY, USA, 2008. ACM. ISBN 978-1-60558158-3. doi: http://doi.acm.org/10.1145/1375527.1375575. 1.2 [12] Jeffrey S. Chase, Darrell C. Anderson, Prachi N. Thakar, Amin M. Vahdat, and Ronald P. Doyle. Managing energy and server resources in hosting centers. SIGOPS Oper. Syst. Rev., 35(5):103–116, 2001. ISSN 0163-5980. doi: http://doi.acm.org/10.1145/502059.502045. 1.2 [13] Kenneth Church, Albert Greenberg, and James Hamilton. On delivering embarrassingly distributed cloud services. In HotNets, 2008. URL http://conferences.sigcomm. org/hotnets/2008/papers/10.pdf. 1, 4 [14] Isabel Correia, Lu´ıs Gouveia, and Francisco Saldanha-da Gama. Solving the variable size bin packing problem with discretized formulations. Comput. Oper. Res., 35(6):2103–2113, 2008. ISSN 0305-0548. doi: http://dx.doi.org/10.1016/j.cor.2006.10.014. 4.1 [15] Peter A. Dinda and David R. O’Hallaron. Host load prediction using linear models. Cluster Computing, 3(4):265–280, 2000. ISSN 1386-7857. doi: http://dx.doi.org/10.1023/A: 1019048724544. 1.1, 2, 2.2, 4.1 [16] E. N. Elnozahy, Michael Kistler, and Ramakrishnan Rajamony. Energy-efficient server clusters. In Power-Aware Computer Systems, volume 2325 of Lecture Notes in Computer Science, pages 179–197, 2003. ISBN 978-3-540-01028-9. doi: 10.1007/3-540-36612-1 12. URL http://www.springerlink.com/content/m487264104l7497r/. 1.2 [17] Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. In ISCA ’07: Proceedings of the 34th annual international symposium on Computer architecture, pages 13–23, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-706-3. doi: http://doi.acm.org/10.1145/1250662.1250665. 1 [18] Anshul Gandhi, Mor Harchol-Balter, Rajarshi Das, and Charles Lefurgy. Optimal power allocation in server farms. In SIGMETRICS ’09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, pages 157–168, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-511-6. doi: http://doi.acm.org/10.1145/ 1555349.1555368. 1.2 [19] Edward L. Haletky. VMware ESX Server in the Enterprise: Planning and Securing Virtualization Servers. ISBN 0132302071, 9780132302074. 1 [20] James Hamilton. Energy proportional datacenter networks. http://perspectives.mvdirona.com/2010/08/01/EnergyProportionalDatacenterNetworks.aspx, 2010. 1.2 [21] Mohamed Haouari and Mehdi Serairi. Heuristics for the variable sized bin-packing problem. Comput. Oper. Res., 36(10):2877–2884, 2009. ISSN 0305-0548. doi: http: //dx.doi.org/10.1016/j.cor.2008.12.016. 2.1 [22] Brian Hayes. Cloud computing. Commun. ACM, 51(7):9–11, 2008. ISSN 0001-0782. doi:

27

http://doi.acm.org/10.1145/1364782.1364786. 1 [23] Tibor Horvath et al. Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Transactions on Computers, 56(4):444–458, 2007. ISSN 0018-9340. doi: http://doi.ieeecomputersociety.org/10.1109/TC.2007.1003. 1.2 [24] ILOG, Inc. CPLEX, 2009. URL http://www.ilog.com/products/cplex/. http://www.ilog.com/products/cplex/. 3, 4.1 [25] Evangelia Kalyvianaki, Themistoklis Charalambous, and Steven Hand. Self-adaptive and self-configured cpu resource provisioning for virtualized servers using kalman filters. In ICAC ’09, pages 117–126, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-564-2. doi: http://doi.acm.org/10.1145/1555228.1555261. 4.1 [26] Nagarajan Kandasamy et al. Self-optimization in computer systems via on-line control: Application to power management. In ICAC ’04, pages 54–61, 2004. ISBN 0-7695-21142. 1.2 [27] G. Khanna et al. Application performance management in virtualized server environments. 10th IEEE/IFIP Network Operations and Management Symposium, pages 373–381, 0-0 2006. ISSN 1542-1201. doi: 10.1109/NOMS.2006.1687567. 1.2 [28] B. Khargharia et al. Autonomic power and performance management for computing systems. Cluster Computing, 11(2):167–181, 2008. 1.2 [29] Dara Kusic et al. Power and performance management of virtualized computing environments via lookahead control. Cluster Computing, 12(1):1–15, 2009. ISSN 1386-7857. doi: http://dx.doi.org/10.1007/s10586-008-0070-y. 1, 1.2, 3.4, 4.1 [30] K. Le, R. Bianchini, M. Martonosi, and T.D. Nguyen. Cost-and Energy-Aware Load Distribution Across Data Centers, 2009. 1.2 [31] McKinsey & Company. Revolutionizing data center efficiency. http://uptimeinstitute.org, 2008. 1 [32] Carlos Oliveira, Vinicius Petrucci, and Orlando Loques. Impact of server dynamic allocation on the response time for energy-efficient virtualized web clusters. In 12th Brazillian Workshop on Real-Time and Embedded Systems (WTR), 2010. 3.6, 4, 4.1 [33] Vinicius Petrucci, Orlando Loques, and Daniel Mossé. A framework for dynamic adaptation of power-aware server clusters. In SAC ’09: Proceedings of the 24th ACM Symposium on Applied Computing. ACM, 2009. 4.1 [34] Vinicius Petrucci, Orlando Loques, and Daniel Mossé. Dynamic configuration support for power-aware virtualized server clusters. In WIP Session of the 21th Euromicro Conference on Real-Time Systems, 2009. 4.1 [35] Vinicius Petrucci, Orlando Loques, and Daniel Mossé. A dynamic configuration model for power-efficient virtualized server clusters. In 11th Brazillian Workshop on Real-Time and Embedded Systems (WTR), 2009. 2 [36] Vinicius Petrucci, Orlando Loques, and Daniel Mossé. A dynamic optimization model for power and performance management of virtualized clusters. In 1st Int’l Conf. on EnergyEfficient Computing and Networking. In cooperation with SIGCOMM. ACM, 2010. 2, 4 28

[37] Vinicius Petrucci, Orlando Loques, and Daniel Mossé. Dynamic optimization of power and performance for virtualized server clusters. In SAC ’10: Proceedings of the 2010 ACM Symposium on Applied Computing, pages 263–264, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-639-7. doi: http://doi.acm.org/10.1145/1774088.1774144. 2 [38] Vahe Poladian, David Garlan, Mary Shaw, Bradley Schmerl, Joao Pedro Sousa, and Mahadev Satyanarayanan. Leveraging resource prediction for anticipatory dynamic configuration. In First IEEE International Conference on Self-Adaptive and Self-Organizing Systems, SASO-2007, pages 214–223, July 2007. 2.4, 4.1 [39] Asfandyar Qureshi, Rick Weber, Hari Balakrishnan, John Guttag, and Bruce Maggs. Cutting the Electric Bill for Internet-Scale Systems. In ACM SIGCOMM, Barcelona, Spain, August 2009. 1.2 [40] Ted Ralphs. Branch Cut and Price Resource Web. http://www.branchandcut.org/, 2009. 3 [41] Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. Thread motion: fine-grained power management for multi-core systems. In ISCA ’09: Proceedings of the 36th annual international symposium on Computer architecture, pages 302–313, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-526-0. doi: http://doi.acm.org/10.1145/1555754.1555793. 1.2 [42] Parthasarathy Ranganathan. Recipe for efficiency: principles of power-aware computing. Commun. ACM, 53(4):60–67, 2010. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/ 1721654.1721673. 1 [43] RUBiS. Rubis: Rice university bidding system. http://rubis.ow2.org/, 2010. 4.2 [44] Cosmin Rusu et al. Energy-efficient real-time heterogeneous server clusters. In RTAS’06, pages 418–428, 2006. ISBN 0-7695-2516-4. doi: http://dx.doi.org/10.1109/RTAS.2006.16. 1.2 [45] Carlos Santana, Luciano Bertini, Julius Leite, and Daniel Mossé. Applying forecasting to interval based DVS. In 10th Brazillian Workshop on Real-Time and Embedded Systems (WTR), 2008. 4.1 [46] Carlos Santana, J. C. B. Leite, and Daniel Mossé. Load forecasting applied to soft real-time web clusters. In ACM Symposium on Applied Computing, Sierre, Switzerland, March 2010. 1, 1.2, 2 [47] Vivek Sharma et al. Power-aware QoS management in web servers. In RTSS’03, pages 63–72, 2003. 1.2 [48] Shekhar Srikantaiah et al. Energy aware consolidation for cloud computing. In USENIX Workshop on Power Aware Computing and Systems, 2008. 1, 1.2 [49] The Apache Software Foundation. http://httpd.apache.org/docs/2.2/, 2008. 4.1

Apache

HTTP

server

version

2.2.

[50] A. Starikovskiy V. Palladi. The ondemand governor: past, present and future. Proceedings of Linux Symposium, 2:223–238, 2001. 3.3 [51] Akshat Verma, Puneet Ahuja, and Anindya Neogi. Power-aware dynamic placement of HPC applications. In ICS ’08, pages 175–184, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-158-3. doi: http://doi.acm.org/10.1145/1375527.1375555. 1.2 29

[52] Akshat Verma et al. pMapper: power and migration cost aware application placement in virtualized systems. In Middleware’08, pages 243–264, 2008. 1, 1.2, 2.1.2, 3.6 [53] William Voorsluys, James Broberg, Srikumar Venugopal, and Rajkumar Buyya. Cost of virtual machine live migration in clouds: A performance evaluation. In CloudCom ’09: Proceedings of the 1st International Conference on Cloud Computing, pages 254– 265, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978-3-642-10664-4. doi: http: //dx.doi.org/10.1007/978-3-642-10665-1 23. 3.6 [54] Yefu Wang et al. Power-efficient response time guarantees for virtualized enterprise servers. In RTSS’08, pages 303–312, 2008. ISBN 978-0-7695-3477-0. doi: http://dx.doi.org/10. 1109/RTSS.2008.20. 1.2, 3.1, 4.1

30