Video streaming, Cloud computing. ⢠CMOS reaching a plateau in power-efficiency ... NETWORK. Threats to Internet's gro
Energy-Proportional Networked Systems Dejan Kostić EPFL, Switzerland Networked Systems Laboratory
Networked Systems Laboratory
2
Our Mission • Make distributed systems more reliable and easier to develop and manage
• Build networked systems that mimic the energyproportionality of biological systems
3
Networking Energy inEnergy ICT 20% of total server energy consumption (3 TWh in US in 2006) Tens of TWh/year by 2015 for broadband equipment
ACCESS NETWORK
DATACENTER NETWORK
CORE NETWORK
Datacenter
Several TWh/year for major telcos (Telefonica 4.5 TWh, Verizon 9.9 TWh) 4
Causes of networking energy consumption • Network redundancy – Achieving high availability
• Bandwidth overprovisioning – Tolerate traffic variations (address lack of QoS)
[SWITCH] 5
Energy-(un)proportionality 100
Core Router
Power (% of peak)
80
Home gateway 60
Typical utilization levels
40
Existing networking hardware
20
Ideal energy-proportionality 0
0
20
40
60
Utilization (%)
80
100 6
Networking energy outlook • More demands will result in further increases – Video streaming, Cloud computing
• CMOS reaching a plateau in power-efficiency – Cooling costs of new equipment will increase • 1 MW for latest Cisco platform, CRS-1
• Power is not limitless – 60 Amps per rack Rate of traffic increase > rate in which underlying technologies improve their energy efficiency 7
Datacenter network (yesterday’s tree)
8
Datacenter network (today’s fat tree)
9
Reduced or no cooling
10
Threats to Internet’s growth Energy deliver/ Power consumption, Coolingdelivery/ Power problems Cooling problems
DATACENTER NETWORK
CORE NETWORK ACCESS NETWORK
Datacenter
Excessive energy consumption 11
Goal: Energy-Proportional Networked Systems 100
Power (% of peak)
80
60
40
20
Goal Ideal energy-proportionality
0
0
20
40
60
Utilization (%)
80
100 12
Make all devices energy-proportional? 100
… but it is hard: • CMOS energy-efficiency limits • Performance penalties • Always-on components
Power (% of peak)
80
60
CPU
DRAM
Disk
Other
100.00
40
Power (% of peak)
90.00 80.00 70.00 60.00 50.00
20
40.00 30.00 20.00
Ideal energy-proportionality
10.00 0.00
0 Idle 0
7
14
21
20
29
36
43
50
57
64
71
40Compute load (%) 60
“Highly efficient” Google Utilization (%)server:
79
86
80
93
100
100 13
Network-wide energy-proportionality Sleeping saves energy Dynamically match resources to the demand make the network energy-proportional
14
A simple optimization problem? 100
Power (% of peak)
80
Computational intensity 60
Maintaining SLOs
Avoiding oscillations Responsiveness to traffic variations
40
Ease of deployment 20
Goal Ideal energy-proportionality
0
0
20
40
60
Utilization (%)
80
100 15
Overview REsPoNse
Energy deliver/ Power consumption, Coolingdelivery/ Power problems Cooling problems
DATACENTER NETWORK
CORE NETWORK ACCESS NETWORK
Datacenter
BH2 Excessive energy consumption 16
Access dominates energy consumption
Backbone/ Metro/ Transport
20-30%
70-80%
ACCESS
17
A typical DSL access network ACCESS ISP PART USER PART Gateway CORE
METRO
18
A typical DSL access network ACCESS ISP PART USER PART Gateway CORE
METRO
Cable bundle Central Office
DSL Access Multiplexers (DSLAMs) 19
WHY DOES THE ACCESS CONSUME SO MUCH?
20
#1: Huge number of devices Individually, they do not consume a lot
But collectively …
2 orders of magnitude more gateways than DSLAMs 1 order of magnitude more DSLAMs than metro devices PhotoBlackburn 2 orders of magnitude more DSLAMs than backbone devices 21
2#: High per bit energy consumption Backbone/ Metro
ACCESS
At full load, access devices 2-3 orders of magnitude higher than metro/backbone 22
3#: Utilization < 10%
Average utilization [%]
Daily utilization of 10K access links in a commercial ADSL provider 10%
uplink
8%
downlink
6% 4% 2%
0% 0
5
10
Time [h]
15
20 23
Sleeping saves energy
Sleep-on-Idle (SoI) Devices enter sleep mode upon periods of inactivity 24
SoI fails in access networks ACCESS ISP PART USER PART
O
An ADSL line needs 1 minute to wake up … but cannot enjoy a minute’s sleep 25
What if we can put 80% of gateways to sleep? 100 W
15 W
1 W per modem Save big fraction at the user side
ISPs … not so much 26
Line cards very unlikely to sleep by SoI Line cards
DSLAM
Modem on Modem off
Static assignment of lines to DSLAM ports is a problem
27
OUR APPROACH [SIGCOMM ‘11] ⟹Greening the user part: aggregation ⟹Greening the ISP part: line switching
28
Greening the user part – Aggregation
Broadband Hitch-Hiking (BH2) Threshold-based heuristic algorithm: direct traffic to neighbor gateways during light traffic conditions
On average 5-6 WiFi networks overlap in typical urban settings 29
Broadband Hitch-Hiking 2 (BH )
Load on neighbor home gateway gateway is low is high low direct look golight back for traffic to to a neighbor another home gateway neighbor gateway or goand back lettohome homegateway gatewaysleep 30
Greening the ISP part – Line switching
DSLAM
Line cards
40-way switch Full switching maximizes savings … but cost quickly grows with the number of ways 31
Small 4-way switches are enough 4-way switches
DSLAM
Line cards
Put line cards to sleep Each micro-electro-mechanical k-switch packs active linesswitches to the top Simple with near-zero power consumption 32
How much energy can we save? Energy savings vs no-sleep [%]
[trace-based simulation] 100
Optimal
80
BH2 + k-switch
60 40
SoI
20 0 0
5
10
Time [h]
15
20
24
BH2 + k-switch saves 66% Optimal savings are 80% 33
A performance bonus Bonus: reduced crosstalk 14
12
4
24 11
1
23 10
3
22
9
15
6 17
2 8
16
5
7
21 20 19
18
50
Avg. speedup [%]
13
62 Mbps; loop lengths 50-600 m
40 30 20 10 0 0
2
4
6
8
10
12
16
Number of inactive lines
20
Powering off lines makes the remaining … go faster due to reduced crosstalk! 34
Overview REsPoNse
Energy deliver/ Power consumption, Coolingdelivery/ Power problems Cooling problems
DATACENTER NETWORK
CORE NETWORK ACCESS NETWORK
Datacenter
BH2 Excessive energy consumption 35
Routing table computation • Goal: match network resources to traffic • Routing that minimizes energy consumption – Multi-commodity flow problem, but with additional constraints for energy objective: Links + routers (switches) on/off
– Problem is computationally intensive – Heuristics take 5-15 minutes for small topologies
When traffic demand changes, optimal routing changes! 36
r ]
6
[Geant2 - European academic network, 15-day trace]
o
5
e
r
4
[ p
3
t e
t r a c e g r a n u la r it y ( u p p e r b o u n d )
M a y -2 8
J u n -1
J u n -5
J u n -9
t i o o c
2
e
Routing table recomputed 3-4 times per hour! (state-of-the-art)
3
m
p
T im e
4
t a
t r a f f ic d e m a n d s
u
1
0
5
n
r a
2
R
D
e
m
a
n
d
h
[ G
u
b
p
s
]
How often is recomputation needed?
1
0
M a y -2 8
J u n -1
J u n -5 T im e
J u n -9
37
Issues with recomputation Traffic Volume
1) Recomputation wastes energy or causes congestion 2) Oscillations 3) Complexity
Time
Recomputation causing congestion
Recomputation causing energy waste 38
Can we precompute routing tables? One routing table used 60% of the time
Fraction of time an energy-optimal routing configuration is used (Geant2 trace) Too many routing configurations 39
Insight CDF of Optimal paths included
120 100 80
Geant FatTree
60 40
20 0 1
2
3
4
5
Number of alternative paths
Just a few precomputed paths offer near-optimal energy savings 40
REsPoNse (Responsive Energy-Proportional Networks)
Always-on paths provide a routing that can carry low to medium amounts of traffic at the lowest energy consumption On-demand paths start carrying traffic when the load is beyond the capacity offered by the always-on paths Failover paths are designed to minimize the impact of single failures
41
Service-Level Objectives
REsPoNse Overview Energy-aware Traffic Engineering (EATe) [e-Energy ‘10]
Runtime Traffic Measurement [COMSNETS ‘09] Online components Offline components
Energy-proportional Routing Table Computation [CoNEXT ‘11]
Traffic Estimation
42
REsPoNse routing example G
D A
K
B
C
E
F
H
J
43
Traffic Volume
REsPoNse in action
Time
Online adaptation Recomputation causing congestion
Recomputation causing energy waste 44
REsPoNse benefits • • • •
Energy savings match state-of-the-art Quick, stable adaptation to traffic changes Deployable Power/cooling provisioning for common case
45
r k
120
• Replayed a 15-day trace • 2 power models
100
o
r i g
i n
a
l
n
e
t w
o
Responsiveness/Energy-Proportionality (Geant)
– Today – Future (static power is significantly reduced)
60
40
P
o
w
e
r
[ %
80
20
0
ospf R E sP oN se R E s P o N s e ( A lt e r n a t iv e H W m o d e l) M a y -2 8
J u n -1
J u n -9
s ]
T im e 6
REsPoNse saves 30% - 45% with adding only 1 carefully precomputed routing table
5
4
3
D
e
m
a
n
d
[ G
b
p
J u n -5
2
1
0
tr a ffic d e m a n d s M a y -2 8
J u n -1
J u n -5 T im e
J u n -9
46
Responsiveness/stability 10 Click open-source routers in a diamond topology (16 ms per-hop latency ) 8
middle lower upper
7 6
Rate (Mbps)
5 4 3 2 1 0 4
4.5
5
5.5
6
6.5
Time elapsed (s)
EATe starts running
Link failure
EATe quickly and in a stable manner shifts traffic as needed (either to save energy or to avoid failed links) 47
resources1
Workload 1 volume/type
Map cloud workloads to minimal resources for power and cooling
Time resources 2
Workload 2 volume/type
State-of-the-art performance tuning takes several minutes, every time the workload changes Time
resources3
Workload 3 volume/type
DejaVu [ASPLOS ’12]: 10-15 seconds to adapt
Time
Time
Conclusions • Energy can substantially limit growth of networked systems • BH2 (Aggregation) + switching saves 66% of access energy • Turning DSL modems off increases performance • REsPoNse: hybrid approach in backbone and data centers • Enables provisioning power/cooling for the common case
49
Thanks! BH2:
Marco Canini, Eduard Goma, Alberto Lopez, Nikolaos Laoutaris, Pablo Rodriguez, Rade Stanojević, Pablo Yague
REsPoNse:
Nedeljko Vasić, Dejan Novaković, Satyam Shekhar, Prateek Bhurat, Marco Canini
DejaVu:
Nedeljko Vasić, Dejan Novaković, Svetozar Miucin, Ricardo Bianchini
50