Evaluating the Performance and Power Consumption of Systems with ...

2 downloads 3408 Views 611KB Size Report
A better knowledge of the expected power consumption of computer hosts that run .... each other and model each of them with a different job class. Multiple ...
!000111111      TTThhhiiirrrddd      IIIEEEEEEEEE      IIInnnttteeerrrnnnaaatttiiiooonnnaaalll      CCCooonnnfffeeerrreeennnccceee      ooonnn      CCCooouuuddd      CCCooommmpppuuutttiiinnnggg      TTTeeeccchhhnnnooolllooogggyyy      aaannnddd      SSSccciiieeennnccceee

Evaluating the Performance and Power Consumption of Systems with Virtual Machines Ricardo Lent Department of Electrical and Electronic Engineering, Imperial College London Email: [email protected]

main challenges on tackling the problem were discussed by Tickoo et al. [1] and Menasce [2]. Examples are the works of Ye et al. [3], who proposed an analytical framework that estimates the response time of different virtualized applications. Also the work done by Akoush et al. [4] who proposed two models to predict VM migration performance. Gemikonakli et al. [5] analyzed virtualized servers by using two-dimensional quasi birth and death processes with an algorithmic solution of the steady state. Benevenuto et al. [6] proposed simpler analytic models for Xen VMs. In relation to power models, Krishnan et al. [7] demonstrated the possibility of power metering VMs. Imada et al. [8] looked at power and QoS issues of virtualized servers handling typical workload. Pedram and Hwang [9] proposed a model for virtualized server based on Intel Xeon 5400 systems mainly driven by CPU load. Meisner et al. proposed yet another model of power consumption for the PowerNap server architecture [10] that used a M/G/1 queuing system to derive predictions. Finally, works devoted to instrument power and energy readers are also related to the present work. The study done by Bedard et al. [11] looked at developing a measurement system for internal devices within a computer system, whereas Kansal and Zhao [12] discussed the development of fine-grained automated tools to profile energy usage of computer resources looking to embed such a system to application development and profiling. Kansal et al. [13] also looked at power metering virtual machines. Lent [14] developed a sensor network for power measurement acquisition scalable to a large number of computing and networking devices.

Abstract—Virtualization allows multiple applications to run on different execution platforms, but sharing the same host machine. A better knowledge of the expected power consumption of computer hosts that run virtualized applications could help to improve capacity planning and optimization of cloud systems that use virtualization for resource management. In this paper, power and performance predictions are estimated from utilization figures of the main computer subsystems (CPU cores, drives, memory, and network ports), which handle the aggregated tasks produced by the virtualized applications. Extensive measurements conducted on two different systems validate the model. Keywords-Virtualization, cloud computing, green computing, performance evaluation.

I. I NTRODUCTION Virtualization is the key technology that has enabled the development and expansion of most cloud computing services, including Amazon Elastic Compute Cloud (Amazon EC2) and Rackspace Cloud. Cloud providers offer metered computing resources to interested users, who can exploit them in turn to deliver their own services to end users. The use of cloud resources can effectively help service providers to reduce costs by minimizing ownership of computing infrastructure and reducing operation costs. Virtualization enables a higher reuse of deployed hardware allowing multiple operating systems and applications to share a common hardware. In most practical cases, virtualization refers to hardware virtualization whereby a virtual machine (VM) is instantiated by a hypervisor running on a computer host and is able to operate almost like an independent machine from the host computer with its own operating system and set of services. By consolidating multiple applications into a smaller set of host machines, the lower number of power consumers could lower energy requirements, which could be of particular interest to data center or server farm operators. On the other hand, each virtualization host could experience increasing power consumption product of the higher levels of computation and I/O generated by the aggregated VMs and their hosted applications. Also product of the resource sharing, the workload of one machine could easily affect the performance of others VM residing on the same host. In this paper, we develop a novel model that can effectively (i) predict the power consumption of a host system when running one or multiple instances of virtualized applications, and (ii) predict the performance of hosted applications. We also discuss parameter acquisition and model validation through a measurement study on two types of host systems.

III. P ERFORMANCE AND P OWER M ODEL The model consists of a non-blocking, multi-class open queuing network [15], [16], [17] (depicted in Figure 1) with four different types of service centers, each corresponding to instances of CPU cores, disks, network ports, and virtual machines. We assume that each virtualized application runs on a separated virtual machine. Service centers representing the principal host’s physical components have associated an infinite queue and are assumed to be FCFS with per-class service rates independent of the load. Virtual machines (and their corresponding virtualized application) are modeled with infinite server nodes (delay nodes). The rationale for this assumption is that the execution of most applications on a virtual machine (rather than directly on the host) produces slower response times. The execution slowdown is the result of the extra clock cycles needed to emulate privilege instruction execution. The overhead introduced by the virtual machine hypervisor and guest operating system also affects execution times. To model the slowdown, we introduce a infinite server node (VM node in Figure 1) and associated probabilities to drive the core and drive subsystems.

II. R ELATED W ORKS The literature offers several works related to the study of performance of virtual machines or virtualized applications. The !777888-­-­-000-­-­-777666!555-­-­-444666222222-­-­-333///111111      $$$222666...000000      ©©©      222000111111      IIIEEEEEEEEE DDDOOOIII      111000...111111000!///CCClllooouuudddCCCooommm...222000111111...111222000

777777888

Any given host system will contain C cores, D drives, N ports, and M virtual machines, with corresponding service rates: µrC (j) (j ∈ {0, . . . , C − 1}), µrD (k) (k ∈ {0, . . . , D − 1}), µrN (i) (i ∈ {0, . . . , N − 1}), and µV (m) (m ∈ {0, . . . , M − 1}). We assume that the load of virtual machines is independent of each other and model each of them with a different job class. Multiple virtual machines (and consequently multiple servers) can run on a single physical computer. Superscript m indicates the perclass service rate at each of the corresponding service centers. Jobs to each virtual machine arrive at the system through the network ports. Λm (i) represents the job arrival rate for virtual machine m through network port i. The total server’s! workload ! is characterized by the cumulative job arrival rate: Λ = m i Λm (i). Transition matrix Pm describes how jobs of virtual machine m receive service by the different server’s components and is specified by the following components. From network port i, jobs may proceed to core j for processing with probability c! (m, j) or leave the system. Each core may generate a number of disk accesses (given by d! (m, k)) and network transmissions (modeled by n! (m, i)).

2) Calculate routing probabilities c! (m, j) which can be obtained from the hypervisor configuration. For example, if a single CPU core was allocated to the virtual machine, all server jobs would be handled by the same physical core. The routing probability from VM service center m to the aggregated core subsystem is a! design parameter, which we c! (m, j) < 1 to model fix to a high value: 0.9 < j the higher CPU utilization that is observed when running virtualized applications. 3) Network port i receives Λm (i) jobs per second of m-class, so routing probabilities become: Λm (i) v ! (i, m) = ! Λm (i) m

4) Routing from virtual machine m to disk k is given by: !

d (m, k) = η(k)

1−

#

!

c (m, j)

j

$

;

#

η(k) = 1

k

η(k) depends on the storage configuration of the virtual machine and the disk access patterns of the application. 5) Observing that applications will tend to reply to job requests through the same network port, the network transmission probability through port i can be calculated from:

Table I M ODEL PARAMETERS Parameter C D N C ! (m) M Λm (i) m µN (i) µm C (j) µm D (k) µV (m) v ! (i, m) c! (m, j) d! (m, k) n! (m, i) z ! (j, m) w! (k, m) CR DR PR

"

Value number of (physical) cores number of (physical) hard drives number of (physical) and network ports number of cores available at the virtual machine m number of virtual machines job arrival rate for VM m through port i network port i service rate for jobs of class m core j service rate for jobs of class m hard disk drive k service rate for jobs of class m virtual server m service rate routing probability from network port i to VM m routing probability from VM m to core j routing probability from VM m to drive k routing probability from VM m to core i routing probability from core j to VM m routing probability from drive k to VM m core service rate of reference system drive service rate of reference system core-to-drive probability on reference system

n! (m, i) = PR v ! (i, m)(1 −

# j

c! (m, j) −

#

d! (m, k))

k

6) Given that CR is the reference service time (when using a single physical core), the use of C ! (m) cores by VM m produces: ! µm C (x) = CR /C (m)

for each of the cores x used by VM m. 7) The service time of VM subsystem m is naturally proportional to the core subsystem service time: µV (m) = fv µC (m) fv can be estimated by the ratio obtained from measurements of the response times from a virtualized and a non-virtualized server from at least one operation point. 8) Service rates for virtualized network ports and drives may require a small scaling to take into account the overhead introduced by the hypervisor:

A. Workload To define a proper workload for the virtual machines, we use as a reference the performance parameters of the intended application when running on a physical host. For the sake of simplicity, we will assume that those model parameters are available and correspond to the case of a system with a single physical core, drive, and network port. Specifically, CR and DR represent the core and disk service rates respectively, and PR , the routing probability from core to disk. Those parameters can be directly estimated from utilization measurements of the CPU cores, disks, and network ports under a specific (and known) application workload, and scaled to estimate the expected utilization on a system with single components of each type. “Moving” an application from a physical to a virtual environment involve the following steps: 1) Introduce a new customer class m to the queuing network described above.

−1 µm D (k) = (fd DR )

−1 µm N (i) = (fn NR )

9) Routing probabilities from cores and disks back to node VM are given by: c! (m, j) z ! (j, m) = ! ! c (m, j) m

d! (m, j) w! (j, m) = ! ! d (m, j) m

777777!

n'(m,i) VM(m)

NIC(i) m Λ (i)

μV(m)

v'(i,m)

m μ N(i)

CPU(j) c'(m,j)

m μ C(j)

z'(j,m)

m μ D(k)

w'(k,m)

DISK(k) d'(m,k)

Figure 1.

250

model measured

75

70

100

0

60 0 90

50

100

150 200 250 300 HTTP request rate (req/sec)

350

400

0

model measured

50

100

150

200

250

300

350

400

HTTP request rate (req/sec)

250

model measured

200

88

Response time (ms)

System power consumption (W)

150

50

65

86

84

150

100

50

82

0

80 0

50

100 150 200 HTTP request rate (req/sec)

250

300

Ψm

The non-blocking open network assumption enables the use a known product-form solution of the system. Note that the model is only valid for ρN (i) < 1, ρC (j) < 1, and ρD (k) < 1, for all i, j and k. The total utilization of each of the physical server components determines the steady-state power consumption Π of the system:

# i=0

#

C−1

αN ρN (i) +

j=0

#

"C−1 # j=0

100

150

200

250

300

ρC (j)

$

+ ΨM

"C−1 # j=0

ρC (j)

$

(1)

The first term, I, represents the power consumption of the idle system which can be considered constant for practical purposes. It includes the power required to keep running basic operating system processes and other idle tasks (i.e., without user workload), such as, the power required to handle timer interrupts and to keep active the hardware clock, network ports, and disk drives. Different operating conditions may produce a different idle power because of variations in cooling requirements. Running distinct background processes

D−1

αC ρC (j) +

50

Figure 3. Observed and predicted HTTP request-reply latency for the two cases shown in Figure 2

B. Power Model

N −1

0

HTTP request rate (req/sec)

Figure 2. Observed and predicted power consumption of a dual-core host. The upper figure corresponds to a single VM instance. In the next figure, two VM instances run the same application. One is fully loaded while the other is loaded as indicated by the horizontal axis.

Π=I+

model measured

200

80

Response time (ms)

System power consumption (W)

85

Model of a host and applications running on virtual machines.

αD ρD (k)+

k=0

777888000

300

associated with CPU load because either after of before (or both) a DMA transfer, any workflow will require CPU usage. For example, after a DMA transfer from a network card to memory as a result of a packet arrival to the system, the card will signal the operating system with an interruption to handle the packet, which will of course cause CPU usage. Another problem is that the time employed by certain CPU activities may not be considered “busy” time by an operating system, nevertheless they do consume CPU cycles and produce memory accesses. The last two terms of equation 1, Ψm (.) and ΨM (.), are devoted to model the power consumption of particular usage patterns of the memory subsystem, both parameterized by CPU load. The first term, Ψm (.), accounts for the power increase caused by context switching due to timer interrupts. Since these interrupts occur at regular times (normally 100 times per second), the power contribution of these interruptions will vary depending on the workload for light intensities (i.e., when some service processes are interrupted) but will remain about constant for medium to high intensities (when almost always a service process will be present at the time of the interruption). Since a busy-loop executes almost all of the time, timer interrupts will quite frequently produce a context switching for the running process. Observe the net effect of this action in the apparent lack of linearity in CPU power. The second term, ΨM (.) models the power consumption of the memory accesses issued by the server to handle service workload. Given that memory latency and bus bandwidth can cause a throughput bottleneck as workload increases, ΨM (.) is likely to have a similar form to Ψm (.) but at a different scale, and so, ΨM (.) should also rapidly increase with workload but eventually saturate. These two memory-access patterns are very similar to what is described by the logistic growth model, which has been used to describe changes in population with decline over time. We assume both Ψm (.) and ΨM (.) to be governed by the same function (with different parameters), so we discuss a generic function Ψ(.) for both:

measured

Throughput (jobs/sec)

250

200

150

100

50

0 0

50

100

150

200

250

300

350

400

HTTP request rate (req/sec) 250

measured

Throughput (jobs/sec)

200

150

100

50

0 0

50

100 150 200 HTTP request rate (req/sec)

250

300

Figure 4. Observed and predicted HTTP throughput for the two cases shown in Figure 2

(e.g, logging events) or connecting (or disconnecting) peripherals may also vary the idle level of a system. We will assume that each of the host subsystems produce a linear power consumption with respect to their individual load. Therefore, the power consumption of a core, disk, or port subsystem is the product of its utilization times a constant factor (αC , αD , and αN respectively). There could be other subsystems contributing to the total power consumed by a server. Unfortunately, it is non-trivial to parameterize the utilization of other subsystems individually, because it could involve the introduction of additional tools, such as hardware extensions or special metering equipment. Among these subsystems, memory is the most relevant contributor to a system’s power. Operating systems’ can usually report memory allocations to processes but not their access rate. Access rates to the different memory hierarchy levels would be needed to properly parameterize their power consumption. Specific hardware, provided for example by some AMD CPUs (with CodeAnalyst) may provide this information through the use of special performance counters introduced into the CPU design. For generic systems, a memory access analysis through code profiling (for example obtained with Valgrind) may provide some insight into memory access patterns of applications. However, these two alternatives are of limited use in our purpose. To some extent, these “hard to measure” operations could be associated to CPU load given that an instruction fetch and execution can generate a number of memory accesses. Even in the case of direct memory accesses (DMA), which can execute memory accesses independently of a CPU, a memory transfer could be

d Ψ(x) = K0 Ψ(x)(1 − Ψ(x)) dx

(2)

where x = ρ/ξ, K0 is a vertical scaling constant and ξ a horizontal scaling constant that determines the value at which the logistic function reaches its plateau. The directly proportional term in equation 2 (Ψ(x)) models a rapid increase of power level as a function of x controlled by parameter ξ. The second term (1 − Ψ(x)) describes the population decline and models the power growth reduction as x → 1. The solution of equation 2 is the logistic function: Ψ(ρ) =

K0 + K1 1 + e−ρ/ξ

(3)

Observing that Ψ(0) must be zero, let us define K0 = 2K and K1 = −K: IV. VALIDATION To test the model, we compare performance and power predictions with measurements obtained from two different server-class systems. System A consists of a quad-core Intel Xeon 3430 (8M cache, 2.4 GHz), 2 GB RAM, single 150 GB SATA hard drive, 2 on-board Gigabit Ethernet interfaces, and an additional 4-port

777888111

Gigabit Ethernet LAN card. System B consists of a dual-core Intel Pentium G6950 (3M cache, 2.80 GHz), 4 GB RAM, 250 GB SATA hard drive and the same Ethernet configuration as System A. Except for the additional 4-port Gigabit Ethernet card, the system are baseline systems offered commercially by a well-known computer manufacturer. To automate power measurements, we have used the Electrical Power Usage Monitoring System (EPUMS) [14], which is a sensor network approach to power monitoring and information collection. The system gives fine-granular power readings that are transmitted over the network to collection sinks. Power observations become available at the host through their /proc filesystem (in Linux), which simplifies the collection and correlation of power observations with regular performance metrics. The power parameters for the physical system were also obtained with EPUMS by loading and measuring each of its components (CPU cores, disks, and network ports):

Figure 2 depicts the power consumption of System B predicted by the model and contrasted with measurements obtained with EPUMS. The leftmost figure shows the case of a single virtual machine instance. The rightmost figure illustrates the system’s power with two VM instances keeping one of them fully loaded. The second VM is progressively loaded by the value represented on the horizontal axis. The corresponding HTTP response times for the set of experiments is shown in Figure 3 and the HTTP throughput in Figure 4. The reported values are averages of at least 20 samples. Figures 5, 6, and 7 show the corresponding results for power, response time, and throughput for System A. V. C ONCLUSION We have proposed a multi-class queueing model to estimate the power consumption of a computer system that hosts a number of virtualized applications. The use of multiple classes enables a better modeling of heterogeneous workload in the system, for instance, when running different types of virtualized applications. To validate the model, we conducted extensive measurements on two different types of host computers running a virtualized web server. We observed that both the performance of the virtualized application (response time, throughput) and the system power measurements closely follow the model predictions.

Table II E STIMATED POWER MODEL PARAMETERS FOR BOTH SYSTEMS . Parameter I αC αN αD Km ξm KM ξM

Comment Idle power consumption Scaling factor: core Scaling factor: network port Scaling factor: drive Ψm (.) parameter 1 Ψm (.) parameter 2 ΨM (.) parameter 1 ΨM (.) parameter 2

System A 60.30 25.70 0.66 7.21 7.85 0.0031 0 0

System B 61.60 12.82 1.79 4.65 5.56 0.00921 0 0

ACKNOWLEDGMENT This work was partially supported by FP7 FIT4Green. R EFERENCES [1] Omesh Tickoo, Ravi Iyer, Ramesh Illikkal, and Don Newell, “Modeling virtual machine performance: challenges and approaches”, SIGMETRICS Perform. Eval. Rev., vol. 37, pp. 55–60, January 2010.

We have used the Apache 2 web server as the benchmarking application for the system, which was virtualized with VirtualBox running on Ubuntu 64 bits host. Each virtual machine was configured to have a single CPU core, 512 Mb of RAM, and a single network port. HTTPerf was used to generate the inputs to the system in the form of HTTP requests at a controllable rate. Each request was set to retrieve a file of length 100 Kb. The Apache 2 module mod_log_config was configured to write the requestresponse time of each request to obtain an accurate measurement of the residence time for each job. To obtain the reference parameters for the web running on the physical host, we used a single (arbitrary) operational point to derive the model’s parameters. To illustrate, the following parameters were derived from System A with the case Λ = 100 (i.e., 100 HTTP request per second):

[2] Daniel A. Menasce, “Virtualization: Concepts, applications, and performance modeling”, 2005. [3] Deshi Ye, Qinming He, Hua Chen, and Jianhua Che, “A framework to evaluate and predict performances in virtual machines environment”, in Proceedings of the 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing - Volume 02, Washington, DC, USA, 2008, EUC ’08, pp. 375–380, IEEE Computer Society. [4] Sherif Akoush, Ripduman Sohan, Andrew Rice, Andrew W. Moore, and Andy Hopper, “Predicting the performance of virtual machine migration”, Modeling, Analysis, and Simulation of Computer Systems, International Symposium on, vol. 0, pp. 37– 46, 2010. [5] O. Gemikonakli, E. Ever, and E. Gemikonakli, “Performance modelling of virtualized servers”, in Computer Modelling and Simulation (UKSim), 2010 12th International Conference on, march 2010, pp. 434 –438.

Table III

A PPLICATION MODEL PARAMETERS . Parameter C! c! (0, 0) c! (0, 1), c! (0, 2), c! (0, 3) n! (0, 0) n! (0, i) d! (0, 0) fC fD fN

Value 1 0.97 0 0.029 0 ; i ∈ {1, 2, 3, 4, 5} 0.001 5.2 5.0 1.15

[6] Fabr´ıcio Benevenuto, C´esar Fernandes, Matheus Santos, Virg´ılio A. F. Almeida, Jussara M. Almeida, G. John Janakiraman, and Jose Renato Santos, “Performance models for virtualized applications”, in ISPA Workshops, 2006, pp. 427–439. [7] Bhavani Krishnan, Hrishikesh Amur, Ada Gavrilovska, and Karsten Schwan, “Vm power metering: feasibility and challenges”, SIGMETRICS Perform. Eval. Rev., vol. 38, pp. 56–60, January 2011.

777888222

85

100

model measured

106

model measured

112

75

70

95

90

85

System power consumption (W)

80

System power consumption (W)

System power consumption (W)

System power consumption (W)

104

102

100

98

96

65 0

50

100 150 200 250 HTTP request rate (req/sec)

300

80

350

model measured

94 0

50

100 150 200 HTTP request rate (req/sec)

250

300

0

50

100 150 200 HTTP request rate (req/sec)

250

110

108

106

104 model measured

102 300

0

50

100 150 HTTP request rate (req/sec)

200

250

Figure 5. Observed and predicted power consumption of a quad-core host. The leftmost figure corresponds to the case of a single VM instance. The following figures show results for a VM instance while extra 1, 2, or 3 instances execute the same application under full load. 120

200

model measured

120

model measured

100

200

model measured

40

100

80

Response time (ms)

60

150 Response time (ms)

Response time (ms)

Response time (ms)

150 80

60

40

50

100

50

20

0

model measured

100

20

0

50

100

150

200

250

300

0

350

0

50

HTTP request rate (req/sec)

150

200

250

0

300

0

50

HTTP request rate (req/sec)

Figure 6. 300

100

100

150

200

250

0

300

0

50

HTTP request rate (req/sec)

100

150

200

250

200

250

HTTP request rate (req/sec)

Observed and predicted HTTP request-reply latency for the cases shown in Figure 5. 250

measured

250

250

measured

200

200

measured

measured

200

100

150

100

Throughput (jobs/sec)

150

Throughput (jobs/sec)

Throughput (jobs/sec)

Throughput (jobs/sec)

150 200

150

100

100

50 50

50

0

50

0 0

50

100 150 200 250 HTTP request rate (req/sec)

300

350

Figure 7.

0 0

50

100 150 200 HTTP request rate (req/sec)

250

300

0 0

50

100 150 200 HTTP request rate (req/sec)

250

300

0

50

100 150 HTTP request rate (req/sec)

Observed and predicted HTTP throughput for the cases shown in Figure 5.

[8] Hiroshi Imada, Mitsuhisa Sato, and Hideaki Kimura, “Power and qos performance characteristics of virtualized servers”, in GRID, 2009, pp. 232–240.

ing”, in Proceedings of the 1st ACM symposium on Cloud computing, New York, NY, USA, 2010, SoCC ’10, pp. 39–50, ACM.

[9] M. Pedram and I. Hwang, “Power and performance modeling in a virtualized server system”, in Parallel Processing Workshops (ICPPW), 2010 39th International Conference on, Sep 2010, pp. 520 –526.

[14] R. Lent, “A sensor network to profile the electrical power consumption of computer networks”, in GLOBECOM Workshops (GC Wkshps), 2010 IEEE, dec. 2010, pp. 1433 –1437. [15] Erol Gelenbe and Isi Mitrani, Analysis and Synthesis of Computer Systems: Texts), Imperial College Press, London, UK, UK, 2nd edition, 2010.

[10] David Meisner, Brian T. Gold, and Thomas F. Wenisch, “The powernap server architecture”, ACM Trans. Comput. Syst., vol. 29, pp. 3:1–3:24, February 2011.

[16] Gunter Bolch, Stefan Greiner, Hermann de Meer, and Kishor S. Trivedi, Queueing Networks and Markov Chains, John Wiley & Sons, Inc., Hoboken, New Jersey, second edition, 2006.

[11] D. Bedard, Min Yeol Lim, R. Fowler, and A. Porterfield, “Powermon: Fine-grained and integrated power monitoring for commodity computer systems”, in IEEE SoutheastCon 2010 (SoutheastCon), Proceedings of the, march 2010, pp. 479 –484.

[17] Daniel A. Menasce, Lawrence W. Dowdy, and Virgilio A. F. Almeida, Performance by Design: Computer Capacity Planning By Example, Prentice Hall PTR, Upper Saddle River, NJ, USA, 2004.

[12] Aman Kansal and Feng Zhao, “Fine-grained energy profiling for power-aware application design”, SIGMETRICS Perform. Eval. Rev., vol. 36, no. 2, pp. 26–31, 2008. [13] Aman Kansal, Feng Zhao, Jie Liu, Nupur Kothari, and Arka A. Bhattacharya, “Virtual machine power metering and provision-

777888333

Suggest Documents