Power and QoS Performance Characteristics of Virtualized Servers

4 downloads 1171 Views 1MB Size Report
With a load migration method between two virtualized servers ... New topics should be considered in virtualized servers .... Suspend To RAM on a host server).
Power and QoS Performance Characteristics of Virtualized Servers

Takayuki Imada, Mitsuhisa Sato and Hideaki Kimura University of Tsukuba, Japan

1

Outline • Motivation and Objective • Power and QoS performance characteristics on virtualized servers   

With one virtualized server node case With a load migration method between two virtualized servers With different types of allocated processor cores to VMs and running frequency

• Related work • Conclusion

2

Motivation • Increasing power consumption in datacenters   

For server nodes, network equipments and cooling facilities A problem to be solved as soon as possible From 2000 to 2005, electricity use on servers became roughly doubled in the U.S. (Jonathan, 2007)

• Server resource management using Virtual Machines(VMs) 

Merits 





3

VM migration and server consolidation Flexible provisioning Green IT!!

Energy reduction on virtualized servers Energy reduction scheme

Knowledge of power optimized server configuration

Is a conventional scheme applicable?

An intelligent server reconfiguration based on the knowledge

New topics should be considered in virtualized servers • Server consolidation • Multiple VMs running on Server1 Server2 multi-core processors VM0

VM1

VM2 VM0



4

VM2

Migration

VM0 VM1

VM1

Core

VM2

How is its effect on performance and power?



Core

Core

Core

How should they be configured for power saving?

Objective • To characterize power and QoS performance on virtualized servers for developing an energy saving scheme 1 Baseline characteristics 



Comparison of power consumption before/after server consolidation Processor’s DVFS control effects in a virtualized server node

2 Effects of workload migration 

Different migration schemes and workload levels

3 How to allocate processor cores to VMs 

Different # of processor cores allocated to VMs and running frequencies

- Consolidation phase Before 1

5

Server1

VM0

Server2

VM1

After Server1

- VM management phase 3 VM0

VM1

VM2

VM0 VM2

VM1

2

VM2

Core

Core

Core

Core

Baseline evaluation Here!! - Consolidation phase Before 1

6

Server1

VM0

Server2

VM1

After Server1

- VM management phase 3 VM0

VM1

VM2

VM0 VM2

VM1

2

VM2

Core

Core

Core

Core

Evaluation of power reduction on a virtualized server • 1 or 2-VM running on a server node 1-VM: Workload 0 Server

Server

VM 0 Core



2-VM: Workload 0

Core

Core

Core

Workload 1

VM 0 Core

VM 1 Core

Core

Core

Two different frequencies: fmin = 1.6GHz,fmax = 2.93GHz

• Benchmark: SPECweb2005  

Banking(B), E-commerce(E) and Support(S) workloads Required QoS in SPECweb2005 



Workload sets 

7

Ratio of requested pages each of which is returned within a defined time (=TIME_GOOD) should be more than 95% (95% TIME_GOOD QoS)



1-VM evaluation: each B, E, S workload 2-VM evaluation: B-E, S-E simultaneous workloads

Test-bed environment Requests Responses

Clients

Load Balancer: IPVS-1.2.1,ipvsadm (Layer-4 software load balancing)

Load balancer Network for accesses

VM 3 Paravirtualized VM: VM 2 1GB allocated memory kernel-xen-2.6.18-128.1.10.el5 (64-bit) Apache 2.2.3 Server specification: Intel Core i7 940(2.93GHz, HT off) DDR3-1333 12GB kernel-xen-2.6.18-128.1.10.el5 (64-bit) Xen 3.3.1 8

Servers for VMs VM 0

VM 5

Network for management

Network for contents

iSCSI Storage

Power reduction by DVFS control • Overview

fmin is enough for the QoS requirement

S(freq=fmin or fmax): # of the maximum simultaneous sessions which can satisfy 95% TIME_GOOD QoS

@ fmax @ fmin S(fmin)

A

Maximum # of sessions S(fmax)

B

• Power reduction @ A (light-weight) and B (middle-weight) 



1-VM: A = 100 simultaneous sessions B = S(fmin) on each workload Sorry, please see our paper 2-VM: A = B-E1 or S-E1 workload set for detailed values B = B-E2 or S-E2 workload set

• Results of power reduction: {1 - P(fmin)/P(fmax)} * 100 [%] 

1-VM evaluation  

9

1% power reduction @ A 5% power reduction @ B



2-VM evaluation  

2% power reduction @ A 8% power reduction @ B

Power reduction by server consolidation

Power consumption (W)

• (2-VM on 1-node) v.s.2 * (1-VM on 1-node) • B-E3 and S-E3 workload sets 300 250

280.57 279.56

Slight increase of power consumption

200 150

140,97 139,59 139,97 144,86 144,22

100

48% power reduction by server consolidation

50 0

10

1-VM 1-node

2-VM 1-node

2-VM 2-node

Power and performance evaluation of load migration schemes - Consolidation phase Before 1 Server1

VM0

Server2

VM1

Server1

VM0

VM1

VM2

VM0 VM2

VM1

2 Here!!

11

After

- VM management phase 3

VM2

Core

Core

Core

Core

Migration schemes • Two different schemes LM (Live Migration): Live migration method provided by Xen Switching: VM 1 exists at the beginning of benchmark run (the workload on VM 0 is switched by a load balancer)

 

LM :

Switching :

Live migration time

Server 0

Server 0

VM 0 VM 0

Server 1

VM 1

Server 1

A B 12

VM 1 None E-commerce

VM 1

Time

30[sec] 90[sec] Migration start VM 0 Banking Banking

VM 0

Switching time VM 0 VM 1 VM 2

Time

30[sec] 90[sec] Switching start

A B

VM 0 Banking Banking

VM 1 Banking Banking

VM 2 None E-commerce

Evaluation overview • Two different background load levels 

Situation A 



Banking workload migration to a no-loaded server

B Server 0

Server 1

Situation B 

Banking workload migration to a server which B handles E-commerce workload Server 0

E Server 1

• Workload sets (with different workload levels of Banking) 

Running @ fmin 



Sbank = 100 or 500, Secom = 600(static)

Running @ fmax 

Sbank = 100, 500 or 1000, Secom = 1000(static)

• Load migration is started 30 seconds after the benchmark start (Benchmark duration: 120 seconds) 13

Results: QoS performance (ratio of requested pages with TIME_GOOD response time)

100 90 80 70 60 50 40 30 20 10 0

• Running @ fmax % of QoS satisfied

% of QoS satisfied

• Running @ fmin

Sbank=100 Sbank=500 A

B

LM

A

B

A

B

100 90 80 70 60 50 40 30 20 10 0

LM scheme couldn’t achieve 95% TIME_GOOD QoS Sbank=100 Sbank=500 Sbank=1000 A

Switching No migration

B

LM

No migration: 2 servers without load migration A: 14

B:

B Server 0

Server 1

B Server 0

E Server 1

A

B

A

B

Switching No migration

Results: Power consumption • Relative power consumption compared to the No migration  

Switching : almost same in all workload sets LM : about 1-4 % increase because of VM memory data transfer

Power profiles during Banking workload (LM, fmax) 350

LM

250 200

Migration time

150

No migration

250 200

Migration time

150

100 Doubled migration time (LM scheme in Xen doesn’t move 50 during pre-copy phase) frequently modified memory pages

100 50 0

0 0

50 Time (sec)

100

Sbank = 500 (4% increase) 15

LM

300 Power (W)

Power (W)

300

350

No migration

0

50 Time (sec)

100

Sbank = 1000 (1% increase)

Discussion • File size characteristics to be considered 



LM scheme can be applicable to large files like movie or music files Switching scheme is more suitable for small files (if lasting communications exist at the Switching call, the start of Switching scheme will be delayed because the communications should not be lost)

• Used resources  

16

LM scheme needs only the minimum number of required VMs Switching scheme needs redundant VMs (but turning on and off VMs can be saved if we can employ Suspend To RAM on a host server)

Evaluation with different processor-core allocation schemes - Consolidation phase Before 1

17

Server1

VM0

Server2

VM1

After Server1 VM0

VM2

VM1

2

- VM management phase 3 VM0

VM1

VM2

Here!! VM2

Core

Core

Core

Core

VM management in a server • For further power optimization, we should consider VM scheduling in each server node • Factors: the number of allocated processor cores and running frequency Ex.) A quad-core processor,fmin = fmax/2

fmax/2 Available resource Required resource = 4 * fmax = 2 * fmax 18

Core 0 Core 1 Core 2 Core 3

fmin

Core VM 00 Core VM 11 Core 2 Core 3

=

fmin

VM 0 VM 1

Freq. fmax Core 0 Core 1 Core 2 Core 3

Freq. fmax

VM 1 VM 0

2-core @ fmax

4-core @ fmin

Which is better ?

Evaluation overview • Configurations (1-VM and 2-VM) Workload set # of VMs Sbank = 1000 1 Secom = 1000 1 Ssupp = 800 1 Sbank = 1000, Secom = 1000 2 Sbank = 1000, Ssupp = 800 2

1.6

2.0 1.6

Core 0 Core 1 Core 2 Core 3

2.93 Core VM 00 Core 1 Core 2 Core 3

2.93

VM 0

1-core @ 2.93GHz 2-core @ 1.6GHz 1-VM evaluation 19

Core 0 Core 1 Core 2 Core 3

Freq.(GHz)

Core VM 00 Core VM 11 Core 2 Core 3

Freq.(GHz)

(# of processor cores, frequency) (1, 2.93GHz) (2, 1.6GHz) (1, 2.93GHz) (2, 1.6GHz) (1, 2.93GHz) (2, 1.6GHz) (1, 2.93GHz) (3, 2GHz) (1, 2.93GHz) (3, 2GHz)

VM 1 VM 0

1-core @ 2.93GHz 3-core @ 2GHz (independent) (shared) 2-VM evaluation

Results: 1-VM evaluation 100 90 80 70 60 50 40 30 20 10 0

• Relative power Relative power

% of QoS satisfied

• QoS performance

1 core - 2.93GHz 2 cores - 1.6GHz Banking

E-commerce

Support

1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0

1 core - 2.93GHz 2 cores - 1.6GHz Banking

E-commerce

Support

• Slight performance degradation of the QoS performance • Relatively increased load by VM scheduling on an additional processor core and the decreased running frequency

• About 10% power reduction in all workloads 20

Results: 2-VM evaluation 100 90 80 70 60 50 40 30 20 10 0

2GHz

• Relative power 2.26GHz

1 core - 2.93GHz 3 cores - 2GHz (Credit) 3 cores - 2 or 2.26GHz (SEDF) Bank. - E-comm.

Bank. - Supp.

Relative power

% of QoS satified

• QoS performance

1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0

2GHz

2.26GHz

1 core - 2.93GHz 3 cores - 2GHz (Credit) 3 cores - 2 or 2.26GHz (SEDF) Bank. - E-comm.

Bank. - Supp.

• Large performance degradation (3 cores - 2GHz, Credit) • A default scheduler (Credit) in Xen(non preemptive,scheduling period = 30[msec]) couldn’t achieve real-time processing

• 4-9% power reduction by boosted processor frequency and a preemptive scheduler (SEDF) 21

Application to existing methods • How to compromise multiple P-states transition calls from each different VM ? Independently different P-states on each core may not be allowed Locate VP(j) regularly, the VP(j) can be realized with f(i) satisfies N * f(i+1) < VP(j) f(1) > f(2) > f(3) ) P-state 0 P-state 1 P-state 2 P-state 3

Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 Core 2 Core 0 Core 1 Core 2 VM 0 VM 1

Required 0

VP(4)

22

Core 2 Core 2 Total resource

N * f(3) N * f(2) N * f(1) N * f(0) VP(3) VP(2) VP(1) VP(0)

VP(2) can satisfy each given QoS requirement imposed on a corresponding VM

Related work • VirtualPower: Virtualized processor P-states for VMs [Nathuji et al., 2007] 



realized by controlling of DVFS on a processor and CPU time slice doesn’t consider a VM running on multiple processor cores

• A virtualized server reconfiguration scheme for energy reduction [Kusic et al., 2008]   

23

Server resource management based on a queuing theory Live migration of VMs is not considered DVFS control for a VM across multiple processor cores is not considered

Conclusion • Power and QoS performance characteristics on virtualized servers for developing an energy saving scheme 

Drastic power reduction by server consolidation is possible 





VM live migration needs considerations on power and performance 



Slightly increased power consumption by an added VM on a server DVFS control for a processor provides further power reduction Live migration time can be varied with workload level for a VM to be moved

Further power reduction can be achieved by both multiple-core allocation to VMs and using lower frequency 

Such processor management can be implemented by extending an existing method

• Future work 

24

To develop an intelligent reconfiguration algorithm based on the obtained results and knowledge

Thanks!!

25

Suggest Documents