With a load migration method between two virtualized servers ... New topics should be considered in virtualized servers .... Suspend To RAM on a host server).
Power and QoS Performance Characteristics of Virtualized Servers
Takayuki Imada, Mitsuhisa Sato and Hideaki Kimura University of Tsukuba, Japan
1
Outline • Motivation and Objective • Power and QoS performance characteristics on virtualized servers
With one virtualized server node case With a load migration method between two virtualized servers With different types of allocated processor cores to VMs and running frequency
• Related work • Conclusion
2
Motivation • Increasing power consumption in datacenters
For server nodes, network equipments and cooling facilities A problem to be solved as soon as possible From 2000 to 2005, electricity use on servers became roughly doubled in the U.S. (Jonathan, 2007)
• Server resource management using Virtual Machines(VMs)
Merits
3
VM migration and server consolidation Flexible provisioning Green IT!!
Energy reduction on virtualized servers Energy reduction scheme
Knowledge of power optimized server configuration
Is a conventional scheme applicable?
An intelligent server reconfiguration based on the knowledge
New topics should be considered in virtualized servers • Server consolidation • Multiple VMs running on Server1 Server2 multi-core processors VM0
VM1
VM2 VM0
4
VM2
Migration
VM0 VM1
VM1
Core
VM2
How is its effect on performance and power?
Core
Core
Core
How should they be configured for power saving?
Objective • To characterize power and QoS performance on virtualized servers for developing an energy saving scheme 1 Baseline characteristics
Comparison of power consumption before/after server consolidation Processor’s DVFS control effects in a virtualized server node
2 Effects of workload migration
Different migration schemes and workload levels
3 How to allocate processor cores to VMs
Different # of processor cores allocated to VMs and running frequencies
- Consolidation phase Before 1
5
Server1
VM0
Server2
VM1
After Server1
- VM management phase 3 VM0
VM1
VM2
VM0 VM2
VM1
2
VM2
Core
Core
Core
Core
Baseline evaluation Here!! - Consolidation phase Before 1
6
Server1
VM0
Server2
VM1
After Server1
- VM management phase 3 VM0
VM1
VM2
VM0 VM2
VM1
2
VM2
Core
Core
Core
Core
Evaluation of power reduction on a virtualized server • 1 or 2-VM running on a server node 1-VM: Workload 0 Server
Server
VM 0 Core
2-VM: Workload 0
Core
Core
Core
Workload 1
VM 0 Core
VM 1 Core
Core
Core
Two different frequencies: fmin = 1.6GHz,fmax = 2.93GHz
• Benchmark: SPECweb2005
Banking(B), E-commerce(E) and Support(S) workloads Required QoS in SPECweb2005
Workload sets
7
Ratio of requested pages each of which is returned within a defined time (=TIME_GOOD) should be more than 95% (95% TIME_GOOD QoS)
1-VM evaluation: each B, E, S workload 2-VM evaluation: B-E, S-E simultaneous workloads
Test-bed environment Requests Responses
Clients
Load Balancer: IPVS-1.2.1,ipvsadm (Layer-4 software load balancing)
Load balancer Network for accesses
VM 3 Paravirtualized VM: VM 2 1GB allocated memory kernel-xen-2.6.18-128.1.10.el5 (64-bit) Apache 2.2.3 Server specification: Intel Core i7 940(2.93GHz, HT off) DDR3-1333 12GB kernel-xen-2.6.18-128.1.10.el5 (64-bit) Xen 3.3.1 8
Servers for VMs VM 0
VM 5
Network for management
Network for contents
iSCSI Storage
Power reduction by DVFS control • Overview
fmin is enough for the QoS requirement
S(freq=fmin or fmax): # of the maximum simultaneous sessions which can satisfy 95% TIME_GOOD QoS
@ fmax @ fmin S(fmin)
A
Maximum # of sessions S(fmax)
B
• Power reduction @ A (light-weight) and B (middle-weight)
1-VM: A = 100 simultaneous sessions B = S(fmin) on each workload Sorry, please see our paper 2-VM: A = B-E1 or S-E1 workload set for detailed values B = B-E2 or S-E2 workload set
• Results of power reduction: {1 - P(fmin)/P(fmax)} * 100 [%]
1-VM evaluation
9
1% power reduction @ A 5% power reduction @ B
2-VM evaluation
2% power reduction @ A 8% power reduction @ B
Power reduction by server consolidation
Power consumption (W)
• (2-VM on 1-node) v.s.2 * (1-VM on 1-node) • B-E3 and S-E3 workload sets 300 250
280.57 279.56
Slight increase of power consumption
200 150
140,97 139,59 139,97 144,86 144,22
100
48% power reduction by server consolidation
50 0
10
1-VM 1-node
2-VM 1-node
2-VM 2-node
Power and performance evaluation of load migration schemes - Consolidation phase Before 1 Server1
VM0
Server2
VM1
Server1
VM0
VM1
VM2
VM0 VM2
VM1
2 Here!!
11
After
- VM management phase 3
VM2
Core
Core
Core
Core
Migration schemes • Two different schemes LM (Live Migration): Live migration method provided by Xen Switching: VM 1 exists at the beginning of benchmark run (the workload on VM 0 is switched by a load balancer)
LM :
Switching :
Live migration time
Server 0
Server 0
VM 0 VM 0
Server 1
VM 1
Server 1
A B 12
VM 1 None E-commerce
VM 1
Time
30[sec] 90[sec] Migration start VM 0 Banking Banking
VM 0
Switching time VM 0 VM 1 VM 2
Time
30[sec] 90[sec] Switching start
A B
VM 0 Banking Banking
VM 1 Banking Banking
VM 2 None E-commerce
Evaluation overview • Two different background load levels
Situation A
Banking workload migration to a no-loaded server
B Server 0
Server 1
Situation B
Banking workload migration to a server which B handles E-commerce workload Server 0
E Server 1
• Workload sets (with different workload levels of Banking)
Running @ fmin
Sbank = 100 or 500, Secom = 600(static)
Running @ fmax
Sbank = 100, 500 or 1000, Secom = 1000(static)
• Load migration is started 30 seconds after the benchmark start (Benchmark duration: 120 seconds) 13
Results: QoS performance (ratio of requested pages with TIME_GOOD response time)
100 90 80 70 60 50 40 30 20 10 0
• Running @ fmax % of QoS satisfied
% of QoS satisfied
• Running @ fmin
Sbank=100 Sbank=500 A
B
LM
A
B
A
B
100 90 80 70 60 50 40 30 20 10 0
LM scheme couldn’t achieve 95% TIME_GOOD QoS Sbank=100 Sbank=500 Sbank=1000 A
Switching No migration
B
LM
No migration: 2 servers without load migration A: 14
B:
B Server 0
Server 1
B Server 0
E Server 1
A
B
A
B
Switching No migration
Results: Power consumption • Relative power consumption compared to the No migration
Switching : almost same in all workload sets LM : about 1-4 % increase because of VM memory data transfer
Power profiles during Banking workload (LM, fmax) 350
LM
250 200
Migration time
150
No migration
250 200
Migration time
150
100 Doubled migration time (LM scheme in Xen doesn’t move 50 during pre-copy phase) frequently modified memory pages
100 50 0
0 0
50 Time (sec)
100
Sbank = 500 (4% increase) 15
LM
300 Power (W)
Power (W)
300
350
No migration
0
50 Time (sec)
100
Sbank = 1000 (1% increase)
Discussion • File size characteristics to be considered
LM scheme can be applicable to large files like movie or music files Switching scheme is more suitable for small files (if lasting communications exist at the Switching call, the start of Switching scheme will be delayed because the communications should not be lost)
• Used resources
16
LM scheme needs only the minimum number of required VMs Switching scheme needs redundant VMs (but turning on and off VMs can be saved if we can employ Suspend To RAM on a host server)
Evaluation with different processor-core allocation schemes - Consolidation phase Before 1
17
Server1
VM0
Server2
VM1
After Server1 VM0
VM2
VM1
2
- VM management phase 3 VM0
VM1
VM2
Here!! VM2
Core
Core
Core
Core
VM management in a server • For further power optimization, we should consider VM scheduling in each server node • Factors: the number of allocated processor cores and running frequency Ex.) A quad-core processor,fmin = fmax/2
fmax/2 Available resource Required resource = 4 * fmax = 2 * fmax 18
Core 0 Core 1 Core 2 Core 3
fmin
Core VM 00 Core VM 11 Core 2 Core 3
=
fmin
VM 0 VM 1
Freq. fmax Core 0 Core 1 Core 2 Core 3
Freq. fmax
VM 1 VM 0
2-core @ fmax
4-core @ fmin
Which is better ?
Evaluation overview • Configurations (1-VM and 2-VM) Workload set # of VMs Sbank = 1000 1 Secom = 1000 1 Ssupp = 800 1 Sbank = 1000, Secom = 1000 2 Sbank = 1000, Ssupp = 800 2
1.6
2.0 1.6
Core 0 Core 1 Core 2 Core 3
2.93 Core VM 00 Core 1 Core 2 Core 3
2.93
VM 0
1-core @ 2.93GHz 2-core @ 1.6GHz 1-VM evaluation 19
Core 0 Core 1 Core 2 Core 3
Freq.(GHz)
Core VM 00 Core VM 11 Core 2 Core 3
Freq.(GHz)
(# of processor cores, frequency) (1, 2.93GHz) (2, 1.6GHz) (1, 2.93GHz) (2, 1.6GHz) (1, 2.93GHz) (2, 1.6GHz) (1, 2.93GHz) (3, 2GHz) (1, 2.93GHz) (3, 2GHz)
VM 1 VM 0
1-core @ 2.93GHz 3-core @ 2GHz (independent) (shared) 2-VM evaluation
Results: 1-VM evaluation 100 90 80 70 60 50 40 30 20 10 0
• Relative power Relative power
% of QoS satisfied
• QoS performance
1 core - 2.93GHz 2 cores - 1.6GHz Banking
E-commerce
Support
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0
1 core - 2.93GHz 2 cores - 1.6GHz Banking
E-commerce
Support
• Slight performance degradation of the QoS performance • Relatively increased load by VM scheduling on an additional processor core and the decreased running frequency
• About 10% power reduction in all workloads 20
Results: 2-VM evaluation 100 90 80 70 60 50 40 30 20 10 0
2GHz
• Relative power 2.26GHz
1 core - 2.93GHz 3 cores - 2GHz (Credit) 3 cores - 2 or 2.26GHz (SEDF) Bank. - E-comm.
Bank. - Supp.
Relative power
% of QoS satified
• QoS performance
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0
2GHz
2.26GHz
1 core - 2.93GHz 3 cores - 2GHz (Credit) 3 cores - 2 or 2.26GHz (SEDF) Bank. - E-comm.
Bank. - Supp.
• Large performance degradation (3 cores - 2GHz, Credit) • A default scheduler (Credit) in Xen(non preemptive,scheduling period = 30[msec]) couldn’t achieve real-time processing
• 4-9% power reduction by boosted processor frequency and a preemptive scheduler (SEDF) 21
Application to existing methods • How to compromise multiple P-states transition calls from each different VM ? Independently different P-states on each core may not be allowed Locate VP(j) regularly, the VP(j) can be realized with f(i) satisfies N * f(i+1) < VP(j) f(1) > f(2) > f(3) ) P-state 0 P-state 1 P-state 2 P-state 3
Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 Core 2 Core 0 Core 1 Core 2 VM 0 VM 1
Required 0
VP(4)
22
Core 2 Core 2 Total resource
N * f(3) N * f(2) N * f(1) N * f(0) VP(3) VP(2) VP(1) VP(0)
VP(2) can satisfy each given QoS requirement imposed on a corresponding VM
Related work • VirtualPower: Virtualized processor P-states for VMs [Nathuji et al., 2007]
realized by controlling of DVFS on a processor and CPU time slice doesn’t consider a VM running on multiple processor cores
• A virtualized server reconfiguration scheme for energy reduction [Kusic et al., 2008]
23
Server resource management based on a queuing theory Live migration of VMs is not considered DVFS control for a VM across multiple processor cores is not considered
Conclusion • Power and QoS performance characteristics on virtualized servers for developing an energy saving scheme
Drastic power reduction by server consolidation is possible
VM live migration needs considerations on power and performance
Slightly increased power consumption by an added VM on a server DVFS control for a processor provides further power reduction Live migration time can be varied with workload level for a VM to be moved
Further power reduction can be achieved by both multiple-core allocation to VMs and using lower frequency
Such processor management can be implemented by extending an existing method
• Future work
24
To develop an intelligent reconfiguration algorithm based on the obtained results and knowledge
Thanks!!
25