A dynamic configuration model for power-efficient virtualized server ...

3 downloads 89 Views 381KB Size Report
Dynamic configurations in the cluster include turning servers off in ... tion techniques which enable the usage of different virtual servers (i.e., operating system.
11th Brazilian Workshop on Real-Time and Embedded Systems

A dynamic configuration model for power-efficient virtualized server clusters Vinicius Petrucci1∗, Orlando Loques1 , Daniel Moss´e2† 1

Instituto de Computac¸a˜ o Universidade Federal Fluminense (UFF), Brasil {vpetrucci,loques}@ic.uff.br 2

Department of Computer Science University of Pittsburgh, USA [email protected]

Abstract. The rising energy costs for keeping up large-scale server clusters is becoming an increasingly important concern for many businesses. In this paper, we present a dynamic configuration model for consolidating multiple services or applications in virtualized server clusters aimed at optimizing power consumption. Dynamic configurations in the cluster include turning servers off in periods of low activity and turning them back on when the demand increases. In our model, we incorporate the often ignored switching penalty in order to avoid frequent and undesirable turning servers on/off. In addition, our model takes advantage of the dynamic voltage/frequency scaling capability of current processor architectures to further optimize power consumption. We present simulation results that to evaluate the quality of the solutions and the scalability of the proposed model.

1. Introduction Currently, there has been a growing trend to make large-scale server clusters (or data centers) self-adaptive [Bodik et al. 2008] and capable of supporting many different webbased services or applications in a seamless, transparent fashion. An example is the emergence of utility/cloud computing platforms [Hayes 2008], such as Amazon EC2 and Google Apps. In these platforms, the services are mostly hosted on several dedicated physical servers and can have different workloads which vary with time. In general, to allow hosting multiple services, these platforms rely on virtualization techniques which enable the usage of different virtual servers (i.e., operating system plus software applications) on a single physical server. Virtualization provides means for server consolidation and allows for migration and dynamic allocation of these virtual servers to physical servers on demand [Verma et al. 2008]. In this scenario, the rising energy costs for keeping up those server clusters is becoming an increasingly important concern for many businesses. This in turn requires major investigation at the energy efficiency of their computing infrastructure ∗

This work has been partially supported by CNPq and Faperj. The author would like to thank Anand Subramanian for many helpful discussions and comments concerning this work. † This material is based on work supported by NSF under grants ANI-0325353, CCF-0811295, CCF0803180 and CNS-0524634.

35

36

11th Brazilian Workshop on Real-Time and Embedded Systems

[Wang et al. 2008, Bertini et al. 2008]. It is recognized that consolidation of services workloads through virtualization techniques shows a great opportunity for increased server utilization and power optimization [Wang et al. 2008, Srikantaiah et al. 2008]. In this paper, we present a dynamic configuration model for virtualized server systems. The problem is to determine the most power efficient server cluster configuration that can handle multiple varying services workloads. Dynamic configuration operations in the cluster include turning off servers in periods of low server activity and turning them back on when the demand increases. To avoid frequent and undesirable turning servers on/off, we incorporate a switchting penalty in our configuration model. Additionally, our model takes advantage of the dynamic voltage/frequency scaling capability of current processor architectures to further decrease power consumption. It should be noted that it is also important to guarantee certain quality-of-service requirements of a server application. For example, we can model a server cluster as a soft real-time system, in which the requests have a specified deadline or target response time. Leveraging our dynamic configuration model, we can implement a controller to monitor the request response time and accordingly to control the cluster performance, as proposed in [Wang et al. 2008, Bertini et al. 2008].

2. Related work A similar configuration model, based on the bin packing problem, for virtualized servers is defined in [Bichler et al. 2006]. However, their model is not designed for power optimization and has some limitations. For example, it does not consider dynamic voltage/frequency scaling (DVFS) for the CPU frequency of the servers. Also, their model is limited in that no one service can demand more performance than the supported by a single server. Our proposed model allows for distributing a service workload among multiple different servers. A heuristic method for the power-aware consolidation problem of virtualized clusters is presented in [Srikantaiah et al. 2008]. Their model is based on a modified bin packing problem, but it may not guarantee to find solutions that are near to the optimal solution. In contrast, we adopt an exact approach based on mixed integer programming. We show by simulation that optimal (or very near optimal) solutions for typical number of servers in a cluster can be obtained using state-of-art optimization algorithms provided by optimization software, like CPLEX [ILOG, Inc. 2009]. In addition, their algorithm also does not include dynamic configurations for the DVFS. Another related approach is described in [Kusic 2008], wherein a dynamic resource provisioning framework is developed based on lookahead control. Their approach also does not consider DVFS, but the proposed optimization controller addresses some quite attractive issues, such as switching machines costs (i.e., overhead of turning servers on and off) and predictive configuration model. Our proposed model also copes with switchting costs as in [Kusic 2008]. We plan to incorporate prediction techniques in our model to improve configuration decisions. In [Wang et al. 2008], the authors present a two-layer control architecture aimed at providing real-time guarantees for virtualized computing environments. The first layer is responsible for load balancing among virtual machines, while the second one uses a

11th Brazilian Workshop on Real-Time and Embedded Systems

control loop to manipulate the CPU frequency for power efficiency and real-time control. Although their approach shows a rigorous basis on control theory, they do not address dynamic migration and on/off mechanisms in multiple servers. Another approach based on DVFS is presented in [Horvath et al. 2007] for power optimization and end-to-end delay control in multi-tier web servers. The works presented in [Rusu et al. 2006] and [Bertini et al. 2007, Bertini et al. 2008] also rely on DVFS techniques and include server on/off mechanisms for power optimization. In contrast to our work, their approaches are not designed (and not applicable) for virtualized server clusters. That is, they do not consider multiple service workloads in a shared infrastructure.

3. Dynamic configuration model Dynamic configuration approaches to optimize the power consumption of server clusters can be implemented by selecting the most power efficient configuration that can handle multiple services of varying workloads. One also needs to control and guarantee certain quality-of-service requirements of the server application, such as a rate of given requests per second or maximum allowed requests response time. Below we define the cluster configuration problem in terms of an optimization problem.

3.1. Notation We start by introducing the following notation. N is the set of servers in the cluster. Fi is the set of frequencies for each server i ∈ N . M is the set of services intended to run on the server cluster. pij represents the capacity (e.g., requests/sec) of the server i ∈ N running at CPU frequency j ∈ Fi . cij defines the cost (e.g., power) to maintain the server i at frequency j. For each service k ∈ M , dk represents the demand (e.g., in requests/sec) of that service. In practice, the service demand vector can be generated by monitoring the load (in req/s) for each application service in the front-end (or load balancer) machine of the server cluster. The decision variables yij and xijk are defined as follows: yij ∈ {0, 1} is a binary variable that equals one if server i runs at frequency j or zero otherwise, and xijk ∈ [0, 1] is a continuous variable that represents the utilization factor of a service k in a given server i that runs at frequency j. The assumption here is that the workloads of a service can be distributed (and balanced) among different servers in the cluster. In practice, this means that each service may be associated with multiple virtual machines running on different servers.

3.2. Problem formulation The problem formulation is thus given by the following mixed integer program (MIP), which is a variant of the bin packing problem:

37

38

11th Brazilian Workshop on Real-Time and Embedded Systems

Minimize

XX

cij · yij

(1)

i∈N j∈Fi

Subject to

XX

pij · xijk ≥ dk

∀k ∈ M

(2)

xijk ≤ yij

∀i ∈ N, ∀j ∈ Fi

(3)

∀i ∈ N

(4)

i∈N j∈Fi

X k∈M

X

yij ≤ 1

j∈Fi

xijk ∈ [0, 1], yij ∈ {0, 1} i ∈ N, j ∈ Fi , k ∈ M

(5)

The objective function given by equation (1) is to find a cluster configuration that minimizes the overall server cluster cost, in terms of power consumption, while dealing with the incoming workload given by constraints (2). To handle the services workloads, for each service k, the sum of the utilization factors xijk related to the performance pij of a set of servers (to be selected) must be greater or equal the given service demand dk . Note that this implies that the soft real-time performance requirements are met. Constraints (3) prevent an undesirable solution in which the overall utilization factor of a set of hosted services is above the capacity of a given server. These constraints also guarantee that variable yij equals one if a server i at frequency j is assigned to handle the demand of a service k. That is, the server will be on if there is at least one service running on it. Constraints (4) are defined so that only one frequency j on a given server i can be chosen. The solution is thus given by the decision variable xijk , where i is a server, j is the server’s status (i.e., its operating frequency or inactive status, when j = 0), and k represents the allocated service. 3.3. Coping with switching costs In a real server cluster environment, dynamic configurations in the cluster may be subjected to switching costs in terms of the power consumed while a machine is being turned on/off. To handle this issue, we incorporate a penalty value in our configuration model to represent the overhead of turning machines on/off. Specifically, we can modify the objective function of the previous configuration model, Equation 1, as follows:

XX Minimize (cij · yij + yij · (1 − confij ) · ON P + confij · (1 − yij ) · OF F P ) (6) i∈N j∈Fi

The modified objective function given by equation (6) has new terms that add overhead penalties: ON P for turning a machine on (if it was off) and OFF P for turning a machine off (if it was on). To calculate a transition cost, we have included in the model a new parameter input confij to denote the current cluster configuration in terms of which machines are turned on and off. In other words, the decision variable yij from the last optimization execution simply becomes the input parameter confij .

11th Brazilian Workshop on Real-Time and Embedded Systems

Note that we can incorporate other overhead penalties in a similar manner. For example, we may consider the time cost for a machine to be turned on, which means it is unavailable to perform any useful work in the meantime [Kusic 2008]. Also, application services in the cluster may retain important persistent states, such as web sessions, which leads to additional cost in switching from one server to another.

4. Cluster example In this section, we present a cluster example to illustrate the effectiveness of our configuration model. Consider m = 3 virtualized services to be hosted by a real cluster with n = 5 servers. We define the vector of servers’ frequency f = [3, 5, 5, 4, 6], where fi is the number of available frequencies of the server i. The processors of the servers are heterogeneous in terms of maximum performance and number of frequencies available, but all are based on a same architecture (AMD Athlon 64). We measured the capacity of the servers, for each frequency, in terms of the maximum number of requests per second (req/s) that they can handle at 100% CPU utilization. To generate the benchmark workload, we used the httperf tool [Mosberger and Jin 1998]. We define a matrix of servers’ maximum performance (in requests per second) as follows:   94.7 168.4 187.6 − − −  47.5 84.6 94.0 102.8 111.5 −    −  pij =    47.0 83.9 93.0 102.3 110.5  47.1 84.0 93.5 102.0 − −  92.9 165.9 184.4 201.0 218.1 235.3 To measured the power of each of the servers with a power sensor (a USB based data acquisition device [National Instruments Corporation 2008]) and using the LabVIEW software environment. The matrix of server power cost (in Watts) at 100% utilization is given by:   81.5 101.8 109.8 − − −  75.2 89.0 94.5 100.9 107.7 −    −  cij =   71.6 85.5 90.7 96.5 103.2   74.7 95.7 103.1 110.6 − −  82.5 99.2 107.3 116.6 127.2 140.1 4.1. Execution scenarios At first, we consider a scenario where the service demand vector (in request/sec) is d = [45, 170, 19]. Given the inputs of demand, power and performance, we solve the optimization problem (see Section 3), which yields a configuration (solution) given by a vector of tuples (k, i, j), where k is a service and i is a server at frequency j, defined by conf = [(1, 5, 6), (2, 5, 6), (3, 5, 6)]. This means that services 1, 2 and 3 are hosted by server 5 at frequency 6, which is its maximum frequency. The values for the utilization factors of the services (given by xijk ) are, respectively, 0.19, 0.72 and 0.08. For example, it means that service 3 is using 8% of the capacity of server 5 at frequency 6. Note that the total utilization of server 6 is less than 100% and the other servers are turned off. At another execution snapshot, d = [45, 200, 19], the demand of service 2 increases to 200. The new configuration solution is defined by conf =

39

40

11th Brazilian Workshop on Real-Time and Embedded Systems

[(1, 1, 1), (2, 1, 1), (2, 5, 3), (3, 1, 1)]. This means that when the demand for service 2 increases from 170 to 200, we need to turn on a new server and migrate (part of) service 2, service 1, and service 3 to this new server that will run at (minimum) frequency 1. In addition, to save power, we decrease the speed of server 5 to frequency 3. Using this configuration, the workload of service 2 is divided between server 1 and 5. Specifically, the values of xijk for service 2 define usage of 32% on server 1 and 92% on server 5. It should be noted that if the demand for service 2 decreases, we may turn off server 1 to save power and migrate back service 1, service 2, and service 3 to server 5. Note also that there is no manipulation of frequencies, only demand values and power values in the optimization. That is, after the initial measurements, the systems is largely independent of its specific characteristics. 4.2. Switching cost scenarios Now we describe two execution scenarios using our configuration model. The scenario A does not account for switching costs, whereas the scenario B does. To account for the switching cost, we define the penalty ON P = OF F P = 20 in terms of the power consumed for turning machines on/off, as proposed in Section 3.3. We have specified a sufficiently high penalty to avoid switching costs, but more accurate values for these penalties have to be investigated in further study. Table 1 shows a comparison between the two scenarios. Time 1 2 3

Demand Config. A Config. B [32, 40, 5] [(1, 1, 1), (2, 1, 1), (3, 1, 1)] [(1, 1, 1), (2, 1, 1), (3, 1, 1)] [32, 60, 5] [(1, 3, 4), (2, 3, 4), (3, 3, 4)] [(1, 1, 2), (2, 1, 2), (3, 1, 2)] [32, 80, 5] [(1, 5, 2), (2, 5, 2), (3, 5, 2)] [(1, 1, 2), (2, 1, 2), (3, 1, 2)] Table 1. Switching cost scenarios

At time 1, the optimal configuration solutions are identical for the two scenarios, which means that all services are hosted by server 1 at frequency 1. However, at time 2, the new configuration for the scenario A requires that server 3 be turned on (at frequency 4) and server 1 be turned off, which involve overhead costs. On the other hand, including switching costs as in scenario B requires that server 1 (which is already turned on) increase its frequency to 2, which adds essentially no disruption to the system. At time 3, similar disruptive actions were employed by the scenario A, whereas the scenario B maintains the same configuration from time 2.

5. Model simulation In this section, we present a simulation concerned with the scalability of our model. As the configuration problem is intended to be solved at run-time and periodically, the optimization algorithm must have a low processing time, considering the usual configuration control period (e.g., seconds or few minutes). This control period represents a soft realtime constraint for the optimization algorithm. In our simulation, we adopt a maximum control period of 5 minutes tested for up to 500 servers and 900 services. We wrote a Python script to generate different scenarios for the virtualized cluster problem. The input data for the script is given by m services and n servers. The generation scheme works as follows. For each server i ∈ N , we randomly select a number

11th Brazilian Workshop on Real-Time and Embedded Systems

of frequencies between 1 and 8. The vector of hardware frequency values is given (in MHz) by [1000, 1800, 2000, 2200, 2600, 2800, 3000, 3200]. For example, if a server has 3 frequencies, they are [1000, 1800, 2000]. For each server i ∈ N , we generate a performance value pi1 for its first frequency by means of a uniform distribution ranging [47, 95], which is based on the minimum values from the example of Section 4. Next, the performance pij , for each (remaining) frequency j = 2 . . . |Fi |, is generated in function of the first frequency using a performance factor γ = pij /pi1 . For example, considering a frequency of 2200 MHz (and first frequency of 1000 MHz), the value of γ corresponds to 2.2. The power cost values of the servers are generated similarly as for the performance values. The difference is that power values ci1 of the first frequency of each server i are generated in accordance to an uniform distribution ranging [71, 83]. To generate the demand dk of every service k ∈ M , we first calculate the maximum load supported by the server cluster using the information regarding the maximum performance of all servers. Next, we determine a load limit value L given by the maximum cluster load divided by the number of services. Finally, the demand is generated, for each service k, at random within the range [1, L]. The MIP formulation described in Section 3 was applied using the solver CPLEX 11.2.0 [ILOG, Inc. 2009], which employs very efficient algorithms based on the branchand-cut method [Ralphs 2009] to search for optimal configuration solutions. The tests were performed in an Intel Core 2 Quad 2.83 GHz with 8GB of RAM memory running Ubuntu Linux (kernel version 2.6.27-7). Although we have used a CPLEX version which allows the use of up to four threads, only one was activated during the optimization process. 5.1. Results A total of 10 pairs of server-service grouping were considered. For each pair, we ran the generator script to build 20 instances, which corresponds to 10 × 20 = 200 test problems. The CPLEX solver was executed for every instance with a time limit of 300 seconds, which is related to the maximum control period for the cluster dynamic configuration. Table 2 shows the results of simulations with different number of servers-services. From 5 to 100 servers, the optimal configuration solutions were found in all 30 runs within the time limit. From 150 to 500 servers, the CPLEX could not find the optimal solution within 300 seconds. In these cases, we have considered the solution gap between the best feasible solution found so far and the lower bound (LB) provided by the solver. In minimization problems, the LB can be seen as a reference value which ensures that the optimal solution is greater or equal than this quantity. It should be noted that the gap values in these instances are quite small, indicating that CPLEX is capable of finding highly acceptable solutions, i.e., close to the optimal lower bound. Another strategy employed aiming to speed up the process of obtaining a high quality solution was to set a gap tolerance of 1% with respect to the optimal solution, which is a user-defined value. This intends to allow the solver to provide acceptable solutions in a short amount of time. Table 3 summarizes the simulations results when the minimum gap tolerance criteria was adopted. It can be observed that in the scenario with 150 servers and 300 services there was a particular instance that took about 96 seconds to

41

42

11th Brazilian Workshop on Real-Time and Embedded Systems

Servers 5 10 25 50 75 100 150 200 350 500

Services 10 20 50 100 150 200 300 500 700 900

Max. time (s) 0.18 0.45 3.88 83.75 139.03 237.56 300.09 300.24 300.46 300.96

Avg. time (s) 0.06 0.15 1.40 15.22 49.40 96.90 211.48 231.05 278.59 276.86

Std. dev. time (s) 0.04 0.09 0.89 20.07 36.24 78.82 104.55 114.38 45.77 55.12

Sol. gap (%) 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.04 0.02 0.08

Table 2. Scalability simulation using the solution time limit criteria

achieve the stopping criteria, whereas the average time of the 20 generated instances was only 9 seconds. This fact suggests that some instances may be very much harder to solve even when we consider the same number of servers and services. Servers 5 10 25 50 75 100 150 200 350 500

Services 10 20 50 100 150 200 300 500 700 900

Max. time (s) 0.10 0.20 0.79 0.84 1.16 2.04 96.88 24.75 113.57 281.68

Avg. time (s) 0.04 0.06 0.14 0.29 0.69 1.20 9.73 14.76 61.59 161.01

Std. dev. time (s) 0.03 0.05 0.17 0.16 0.26 0.45 20.10 5.72 19.84 58.41

Table 3. Scalability simulation using the optimality gap criteria

Even though we generate a number of scenarios involving different pairs of serverservices, it is not possible to assume that the CPLEX will have a similar behavior in these instances. The main difficulty is that the branch-and-cut method has a worst-case exponential time complexity and depending on the combination of services workloads, this approach may lead to poor solutions, or can fail to obtain a feasible solution, in an acceptable runtime execution. Nevertheless, based on the simulations presented here, we have observed that the CPLEX performs well in the average case.

6. Conclusion and future work In this paper, we dealt with the dynamic configuration problem on virtualized server clusters. To solve it, we have developed a mixed integer programming (MIP) formulation and described how our model works by using a small-scale cluster example. The quality of the solutions obtained and the scalability were evaluated through simulations using an optimization solver. Given a typical configuration control periods of few minutes, the proposed configuration model is suitable and scales well for clusters with up to 500 servers. This seems to be a reasonable size for web servers if we consider that in practice the

11th Brazilian Workshop on Real-Time and Embedded Systems

bottleneck tends to be the front-end(load balancer). Thus, one may divide the servers in smaller clusters in a hierarchical fashion to address such scalability issues. As for future work, we intend to apply our model to a real virtualized computing environment leveraging virtual machine technology, which provides capabilities and mechanisms to manage shared server clusters, such as the Xen hypervisor [Barham et al. 2003]. In previous work, we have developed a general framework for adaptive server systems [Petrucci et al. 2009] and we plan to make use of this framework to implement our dynamic configuration model. Moreover, the increasing number of processing cores has become a promising way of improving the performance of servers. This allows for many interesting configuration possibilities for power optimization, such as dynamic voltage/frequency scaling and on/off mechanisms at the core-level. We also plan to incorporate prediction techniques in our model to improve dynamic configuration decisions, such as those described in [Kusic 2008].

References Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. (2003). Xen and the art of virtualization. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 164–177, New York, NY, USA. ACM. Bertini, L., Leite, J., and Moss´e, D. (2007). Statistical QoS guarantee and energyefficiency in web server clusters. In 19th Euromicro Conference on Real-Time Systems, pages 83–92. Bertini, L., Leite, J., and Moss´e, D. (2008). Dynamic configuration of web server clusters with QoS control. In WIP Session of the 20th Euromicro Conference on Real-Time Systems. Bichler, M., Setzer, T., and Speitkamp, B. (2006). Capacity Planning for Virtualized Servers. Workshop on Information Technologies and Systems (WITS), Milwaukee, Wisconsin, USA. Bodik, P., Armbrust, M. P., Canini, K., Fox, A., Jordan, M., and Patterson, D. A. (2008). A case for adaptive datacenters to conserve energy and improve reliability. Technical Report UCB/EECS-2008-127, EECS Department, University of California, Berkeley. Hayes, B. (2008). Cloud computing. Commun. ACM, 51(7):9–11. Horvath, T., Abdelzaher, T., Skadron, K., and Liu, X. (2007). Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Transactions on Computers, 56(4):444–458. ILOG, Inc. (2009). CPLEX. http://www.ilog.com/products/cplex/. Kusic, D. (2008). Combined power and performance management of virtualized computing environments using limited lookahead control. PhD thesis, Drexel University. Mosberger, D. and Jin, T. (1998). httperf – a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev., 26(3):31–37.

43

44

11th Brazilian Workshop on Real-Time and Embedded Systems

National Instruments Corporation (2008). NI USB-6008/6009 user guide and specification. http://www.ni.com/pdf/manuals/371303e.pdf. Petrucci, V., Loques, O., and Moss´e, D. (2009). A framework for dynamic adaptation of power-aware server clusters. In SAC ’09: Proceedings of the 24th ACM Symposium on Applied Computing. ACM. Ralphs, T. (2009). Branch Cut and Price Resource Web. http://www.branchandcut.org/. Rusu, C., Ferreira, A., Scordino, C., Watson, A., Melhem, R., and Moss´e, D. (2006). Energy-efficient real-time heterogeneous server clusters. In IEEE Real Time Technology and Applications Symposium, pages 418–428, Washington, DC, USA. IEEE Computer Society. Srikantaiah, S., kansal, A., and Zhao, F. (2008). Energy aware consolidation for cloud computing. In USENIX Workshop on Power Aware Computing and Systems. Verma, A., Ahuja, P., and Neogi, A. (2008). Power-aware dynamic placement of HPC applications. In ICS ’08: Proceedings of the 22nd annual international conference on Supercomputing, pages 175–184, New York, NY, USA. ACM. Wang, Y., Wang, X., Chen, M., and Zhu, X. (2008). Power-efficient response time guarantees for virtualized enterprise servers. In RTSS ’08: Proceedings of the 2008 Real-Time Systems Symposium, pages 303–312, Washington, DC, USA. IEEE Computer Society.