2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues |
August 17, 2011 | Stephen Kosonocky. OUTLINE. ▫ Motivation for DVFS and
Power ...
PRACTICAL POWER GATING AND DYNAMIC VOLTAGE/FREQUENCY SCALING
Stephen Kosonocky
AMD Fellow
1 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
OUTLINE Motivation for DVFS and Power Gating Dynamic Voltage/Frequency Scaling Power Gating Conclusion
2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MOTIVATION
3 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
SERVER COST TRENDS
Server compute capacity/$ is increasing steadily – Driving up total power/$ spent on servers – Also driving up electrical power spending Green initiative and focus on energy independence creates additional pressure to lower energy costs Increased energy efficiency directly reduces costs
• Source Data: Uptime Institute, 2009
4 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MOBILE AND DESKTOP Now: Parallel/Data-Dense
Diverse workloads – Data serial, data parallel
16:9 @ 7 megapixels HD video flipcams, phones, webcams (1GB) 3D Internet apps and HD video online, social networking w/HD files
– Streaming
3D Blu-ray HD
– Compute intensive in bursts
Multi-touch, facial/gesture/voice recognition + mouse & keyboard
– Long periods of inactivity
All day computing (8+ Hours)
Small form factor Long battery life
Immersive and interactive performance
Instant access of the device
Autonomous power management - Samuel Naffziger, VLSI Symposium 2011, Kyoto
5 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Workloads
High degree of integration of special functions
POWER/PERFORMANCE TRADE-OFF
Processor Power-Frequency/Voltage 35
Perf/Watt
30
Watts/Core
25
20
10
Power
Performance/Watt maximized at the lowest voltage operating point
8
6
15
4 10
Fmax/Voltage
5
0
2
0 0.725 0.775 0.825 0.875 0.925 0.95
1.025 1.075 1.125 1.175 1.225
Voltage Performance/Watt maximized at VMIN 6 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Frequency (GHz)
12
HIGH PERFORMANCE CORE POWER
Depending on process node, leakage ~ 40% total power 7 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Two major knobs have emerged for controlling power 1. Dynamic Voltage and Frequency Scaling – Optimize performance for the application while it’s running 2. Power Gating – Gate power during idle periods Each present unique challenges for
implementation and optimization
8 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DYNAMIC VOLTAGE/FREQUENCY SCALING
9 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LLANO ACCELERATED PROCESSING UNIT (APU) • Integration Provides Improvement • Eliminate power and latency of extra chip crossing • 3X bandwidth between GPU and Memory • Same sized GPU is substantially more effective • Power efficient, advanced technology for both CPU and GPU • Thermal management optimization between CPU and GPU Key features for Mobile Mark 07 – Lower Idle, Active power – DVFS / Clock-gating / Power Gating – Robust and Flexible Power Management Control
~ Power Breakdown Analog
IO PHY CPU
NB
10 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DDR PHY
GPU
APU POWER MANAGEMENT
- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 11 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DYNAMIC VOLTAGE SCALING
Dynamic frequency and voltage scaling (DVFS)
Voltage
P-states: Core executing at some frequency
Match the frequency of operation with the workload
VP-states
C1/C2: Core halted and frequency ramped down
VAltvid
C3/C4: Core halted, clocks ramped to even lower frequency, maintain caches, maintain core
Lower voltage to sustain required frequency
Examples: P-states for active control of voltage/frequency
Vdeeper Altvid
C5: Core halted, clocks ramped to even lower frequency, flush caches, maintain core
C-states for degrees of idle control 0V
C6: Core off, flush caches, core state saved to DRAM
12 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DESIGN FOR DVFS Optimization of timing across entire voltage range necessary for efficiency DVFS Required two separate timing corners with equally aggressive cycle time goals – 0.8V & 1.3V Both gate-dominated timing paths and paths with high RC content have been tuned to provide better scaling than the prior design Enables the core to thoroughly exploit boost capability
13 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU DVFS
- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 14 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU THERMAL PROFILE
110.0
105.0
100.0
95.0
90.0
85.0
CPU-centric workload
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
15 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU THERMAL PROFILE
110.0
105.0
100.0
95.0
90.0
85.0
GPU-centric data parallel workload
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
16 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU THERMAL PROFILE
110.0
105.0
100.0
95.0
90.0
85.0
Balanced workload
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
17 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
WHAT HAPPENS WHEN WE ADD MORE CORES?
CPU 0
Already at 6, 2 high current supplies
CPU 1
PLL
PLL
VDD
CPU 2
CPU 3
PLL
VRM
DIMMs SVI
PLL
VDDIO, VT T PLLs
Misc
VLDT
Graphics Northbridge
PLL
VDDN B 18 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
PLLs
VDDA
PACKAGE LIMITS Packages add constraints to increasing number of discrete supplies
Typical low cost package structure
Typically only use 4 thick layers for high current power distribution 3-4 thin build-up layers on each side – Used for secondary supply and signal routing to pins Difficult to add many more supply rails
19 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Chip
VDD
VDDR
VDDNB
VDDA
TYPICAL PACKAGE LAYER ROUTING LAYERS
VDDIO
M1 – PWR (top metal layer) Package capacitors
M2 – routing layer 1 Routing congestion
Space limited for capacitors 20 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CORE VOLTAGE POWER DELIVERY IN PACKAGE
VSS Max Package Droop
VDD Max Package Droop 0.041 0.04
2.4x higher
-0.025 -0.0255 -0.026
0.039 0.038
-0.0265
0.037 -0.027
0.036 0.035
1
-0.0275
0.034
S29
-0.0285
S22 S15
31
S1
34
25
28
19
22
S8
16
13
-0.029
7
S22 S25
S16 S19
S13
S7
S4
S1
31
S10
25
10
0.032
19
-0.028
1
0.033
13
4
7
Package Routing Constraints forces compromises for VDD plane – More difficult with increasing VDD planes More resources dedicated to VSS – Shows less droop
21 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
WHAT DOES THE ACTUAL VDD LOOK LIKE? 25ns trace of DGEMM benchmark 1.35
1.3
1.2
~50mV 1
VDD
1.25 0.8 1.2
0.6 0.4
1.15
0.2 1.1
0
Time X86 core Supply simulation w/ package board model Increased local decoupling cap could help 22 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Normalized Current
1.4
PACKAGE CAPS FOR SUPPLY DECOUPLING
Most effective
Marginally effective
Hard to find places for additional components 23 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LARGE ON-DIE DECOUPLING CAPACITOR FOR REDUCTION OF SUPPLY NOISE Trench Capacitor Shift resonant frequency from 80MHz -> 7MHz
- D. Wendel, et.al. “The Implementation of POWER7: A Highly Parallel and Scalable Multi-Core High-End Server Processor”, ISSCC 2010
100fF/um2 ~ 25x thick-ox
Adds significant process cost
24 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
METHODS FOR INTEGRATED VOLTAGE REGULATION
Switched capacitor regulator Buck converter
Linear regulator Switched voltages
25 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED SWITCHED CAPACITOR REGULATOR Efficiency dependent on load current
2:1 Conversion
– Most efficient at 2:1 voltage conversion ratio – Limits number of switches in the path – ~90% for 2:1, ~70% for other values Requires large capacitors – Can use trench capacitor for better area efficiency
Large voltage and current ripple – Can mitigate with large decap and/or interleaved converters
Leland Chang, Robert K. Montoye, Brian L. Ji, Alan J. Weger, Kevin G. Stawiasz, and Robert H. Dennard, “A Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm2”, VLSI Symposium , 2010
26 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED BUCK CONVERTER Integrated inductor options ~ 0.33nH possible
High frequency operation can reduce inductance requirements L proportional 1/Fs Can reasonably approach 1GHz when fully integrated
Parasitic series resistance is key problem I2R loss large Multi-phase or parallel regulator architecture can help – (I/n) 2 R
(n=number phases)
Interleaved Outputs
– Current ripple is averaged by output cap Too many phases will reduce efficiency for low current loads
27 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
W. Kim, et.al, June, 2007
INTEGRATED BUCK CONVERTER KEY LIMITATION: DCR VS. INDUCTANCE
• Embedded inductors needs lower resistance or higher inductance/turn • Multi-phase or parallel can approach reasonable efficiency only with very high number of paths • Reduces current in each inductor
CMOS Inductor vs. Small Package size components
• Fully embedded approach really needs a specialized integrated inductor technology Peak Efficiency considering inductor resistive loss alone
>70%
Discrete inductors consume valuable package resources 28 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED LINEAR REGULATION
Error between divided output and reference voltages is and fed to the pass transistor to correct the error NMOS type higher performance PMOS type (Low Drop Out)
Efficiency good only for small (Vin – Vout) Eff
Vin Vref Vout ErrAmp
= (Power Out)/(Power In) = ~ Vout/Vin
LDO design can be more efficient (PMOS output) – Stability & speed of feedback become an issue
29 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Iload
LINEAR REGULATOR (PMOS)
PMOS based linear regulators offer low dropout voltage while operating within (0-Vin) range
Vin
Vref
Required gain-bandwidth product challenging high for processor loads
+ Vout Iload
Reff
Cdecap
BW ~ 1GHz needed Hurts efficiency, power wasted in Op-amp
30 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LINEAR REGULATOR (NMOS) Charge Pump
Vref
Vin
Source-follower topology
+ -
Will require charge pump to provide gate overdrive. Accurate voltage setting. Supply rejection – dropout voltage tradeoff
Vout Iload
Reff
Cdecap
High BW response makes it a good candidate for integration w/o large Capacitors Intrinsically good frequency response
Charge-pump (or low current supply) allows Low drop out
Requires charge pump or high voltage supply >Vin for good efficiency
Higher Efficiency
Amplifier is high gain, low bandwidth. Bandwidth and efficiency depends on desired slew rate 0Vth devices offer additional efficiency
31 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DVFS SYSTEM IMPLEMENTED WITH DUAL VOLTAGES Fully integrated167 processor system Each processor tile contains a core that operates at: – A fully-independent clock frequency Any frequency below maximum Halts, restarts, and changes arbitrarily
– Dynamically-changeable supply voltage VddHigh or VddLow Disconnected for leakage reduction
Dean N. Truong, Wayne H. Cheng, Tinoosh Mohsenin, Zhiyi Yu, Anthony T. Jacobson, Gouri Landge, Michael J. Meeuwsen, Christine Watnik, Anh T. Tran, Zhibin Xiao, Eric W. Work, Jeremy W. Webb, Paul V. Mejia, Bevan M. Baas, “167-Processor Computational Platform in 65 nm CMOS”, JSSC, Vol. 44, No. 4, April 2009
Each power gate comprises 48 individually-controllable parallel transistors
32 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING
33 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
WHAT'S THE PROBLEM? JUST TURN IT OFF WHEN YOUR NOT USING IT!
Many choices and options Need to analyze your particular application for the optimal solution Considerations: – – – – – – –
Design complexity Power savings Frequency/performance impact Wake-up time Area overhead Verification complexity Return on investment (ROI)
34 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING MODES (ACTIVE)
During active mode Natural low gate activity factors inside of logic block allow sharing of power gate device for many logic functions with gated logic block – Reduces resistance requirements of power gate device
RON
IMAX
– Works to improve Ion/Ioff ratio requirements
Design objective – Minimize resistance of cut-off device
Penalties:
Logic Block
– Increased logic delay – Requires higher external supply voltage for same frequency Consumes more power during active mode
VLOGIC Vdd VIRPG VIRPG RON I MAX
Header Switch
35 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING MODES (SLEEP)
During sleep mode Steady state – Want to maximize OFF resistance of cut-off device – Reduce stand-by power with lower leakage
ROFF
ISLEEP
(10x -> 1000x possible)
Transition to cutoff
Logic Block
– Creates supply di/dt for small ISLEEP vs. large IL
PSLEEP Vdd I SLEEP
dI CUTOFF I SLEEP I L dt tCUTOFF
Header Switch
36 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING MODES (WAKE)
During wake-up mode – Need to control in-rush current to charge logic devices to VDD
RWAKE
Logic capacitance is discharged to nVdd during sleep mode
IWAKE
IWAKE
CLogic
1 n Vdd RWAKE
Header Switch
1st order analysis
37 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LEAKAGE SAVINGS WITH POWER GATING
>20x leakage savings
Early Llano Measured Results Very effective method for controlling power during idle periods Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI CMOS”, JSSC, January, 2011 38 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DETERMINATION OF FREQUENCY IMPACT DURING ACTIVE MODE Determine tolerable voltage drop for active mode operation – De-rate timing models by fixed maximum value to account for voltage loss in power switch – Increase external voltage to compensate for voltage drop
Use critical path simulations or representative RO data for voltage/frequency trade-off Design worst-case power gate and grid resistance to meet specification target Analyze drop at highest operating voltage and highest power consumption Keep voltage drop due to power gating small to minimize uncertainty at power island boundary crossings 2nd order effects: – Circuit area growth due to wire blockages caused by power switch – Decreased available quiet decoupling capacitance due to increased resistance to implicit and explicit charge reservoirs on the die
39 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
FREQUENCY IMPACT ESTIMATED WITH RING OSCILLATOR 32 nm FO1 RO Data (RVT, min W) 3.0%
2.5%
Frequency Delta
1% VDD loss ~ 1% frequency loss 2.0%
0.5% Delta VDD 1.0% Delta VDD 1.5% Delta VDD
1.5%
1.0%
Target IR drop: VIRPG~11mV @1.1V
0.5%
0.0%
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
VDD (V)
40 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
1.4
POWER GATE CELL I-V CHARACTERISTICS
76.50
7.0E-04
6.0E-04
FET operates in the linear region
76.00
RDS 75.50
4.0E-04
3.0E-04
Can be approximated as a resistor
2.0E-04
75.00
74.50 1.0E-04
Target region of operation
0.0E+00
0.00
74.00
0.01
0.02
0.03
0.04
VDS (V)
41 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
0.05
0.06
RDS (ohms)
IDS (A)
5.0E-04
IDS
EMBEDDED ROW-BASED POWER SWITCH PLACEMENT VirVDD VDD
PS PS PS PS PS PS PS PS PS PS PS
VSS
Current Flow
VirVDD VDD
PS PS PS PS PS PS PS PS PS PS PS
VSS
Pre-populate logic area with power gate cells at uniform locations prior to place and route 42 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MAXIMUM POWER GATE SPACING (ROW-BASED PLACEMENT)
Imax
VIRW(0)
VIRW(1)
VIRW(2)
VIRW(3)
VIRW(4)
VIRPG
Assume uniform current density Can estimate power gate spacing with respect to IR and EM limits
VIRPG
= 2 * rows * IROW * RPG
VIR_WC = VIRPG + ∑ VIRW
43 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
IR DROP EXAMPLE (UNIFORM POWER DENSITY) IR Drop Along Virtual VSS Perpendicular to Standard Cell Rows 70.0 60.0 1X Metal Virtual Supply IR drop 1.3X Metal Virtual Supply IR drop
IR Drop (mV)
50.0 40.0 30.0 20.0
Max current density = 1.0A/mm2 75.57ohm footer
10.0 30rows
0.0 0
20
40
60
80
100
Standard Cell Rows Between Footers 44 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
120
140
EM EXAMPLE (UNIFORM POWER DENSITY) EM Margin Along Virtual VSS Perpendicular to Standard Cell Rows 25%
EM Limit
20%
1X Metal EM Limit 1.3X Metal EM Limit
Max current density =1.0A/mm2
15%
10%
5% 30rows
Thick metal will provide more EM margin
0% 0
20
40
60
80
100
Standard Cell Rows Between Footers 45 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
120
140
VERTICAL CROSS SECTION C4 Bump M11
Full VDD Grid
M10
thicker metal
M9 M8 M7
VDD Via Tower
Virtual VDD Grid
M6 M5 M4 M3
EM
M2 VDD
PS
VVDD
Logic
EM hazards can occur in high power density regions 46 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EMBEDDED POWER GATE EXAMPLE STYLE USED IN LLANO GPU AMD GPU Functional Unit Power Gate Tile Power gate: Checkerboard pattern
SRAM Always-on cells: FeedThru cells, Power control logic, Clamp cells for macro input pins and tile output ports. Input I/O buffers
Power gate cells must squeeze around arrays Providing power to always-on cells can be problematic IP I/O protection typically needed to isolate individual regions SRAMs need their own custom power gating 47 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EMBEDDED POWER GATE EXAMPLE AMD GPU Functional Unit Power Gate Tile
Simulated with Apache Redhawk in vectorless dynamic analysis mode
Power gating increases distribution of IR drop on grid Can cause problems with critical circuits and paths 48 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EXTERNAL RING STYLE GATING
Interrupt global supply grid to power-gated IP block Requires careful chip floorplanning to accommodate supply interruption
Increased sharing of power gate devices can ease power gate sizing – Large IP power gating can greatly ease worst-case current analysis
Global Grid
GND VGND
Macro/Core M1 metal
Virtual Grid
Can be difficult to create always power-on islands with power-gated region
49 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
M2 metal
Footer Switch Location
LLANO CPU CORE RING GATING WITH PACKAGE LAYER ASSIST
Virtual voltage spreads uniformly Bumps near hot spots can exceed max limits
Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI CMOS”, JSSC, January, 2011
High RVSS can create noise issues
50 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
STRAIGHT EDGE & ZIG/ZAG EDGE PACKAGE PLANE BOUNDARIES
Peak currents flowing into bumps at periphery can be alleviated using zig/zag approach 51 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MAXIMIZING POWER GATING OPPORTUNITIES To maximize power savings, we would like to go into a power gating mode as soon as possible to reduce leakage power Practical constraints limit opportunity – Energy break-even point – In-rush current induced noise – Time to restore state of power gated region
Power
Application Mode
A c t i v e
Idle
A c t i v e
Idle
A c t i v e
Idle
time
52 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
ENERGY BREAK-EVEN POINT
Power Gate Example: T0:
Logic block starts inactivity
T1:
Control circuit decides to power gate
T2:
Power gate signal propagation complete
T3:
Energy break-even point
T4:
Full discharge of virtual supply
T5:
Control circuit decides to repower logic
T6:
Power gate signal propagation complete
T7:
Gated logic is fully charged and ready for activity
Z. Hu et al., "Microarchitectural Techniques for Power Gating of Execution Units," ISLPED'04, August 9–11, 2004, Newport Beach, California, USA.
53 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
IN-RUSH CURRENT CONTROL During wake-up of the power gated region it’s critical to control the inrush current
The effective resistance of parallel combination of footers is very small – Can create excessive currents
Transition to cutoff mode can also create similar di/dt events Hazards – Supply bounce from large di/dt – EM violations – Short circuit currents from imbalanced nodes in power gated region during wake-up
S. Kosonocky
54 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
54
UNDERSTANDING SUPPLY BOUNCE
In-rush current can cause supply bounce on neighboring IP
Suhwan Kim, Chang Jun Choi, Deog-Kyoon Jeong, Stephen Kosonocky, Sung Bae Park, “Reducing Ground-Bounce Noise and Stabilizing the Data-Retention Voltage of Power-Gating Structures”, IEEE Transactions on Electron Devices, Vol. 55, NO. 1, January 2008
55 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
55
IN-RUSH CURRENT SIDE-EFFECTS
Disturb neighboring circuits – Retention flops, adjacent IP logic, SRAM Supply bounce, ground bounce
Over stress gate-oxides due to voltage excursions Un-even distribution of wake signals during power-up – Creates excessive short circuit currents when some gates are charged and others are not
Large di/dt can cause EM failures
S. Kosonocky
56 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
56
PROGRAMMABLE IN-RUSH CURRENT CONTROL When exiting power gating, power gates are gradually enabled in groups with progressively larger effective device widths – Controlled by FSM – Fuse programmable strengths and duration for post-silicon tuning – Analytical R-C model w/o inductance demonstrating control
group1 group2 group3
group4 group5
57 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LLANO CPU CORE IN-RUSH CURRENT ON CC6 EXIT
Current (A)
3.50
1.4
3.00
1.2
2.50
1
2.00
0.8
1.50
0.6
1.00
0.4
0.50
0.2
0.00
0 0
20000
40000
60000
80000
VSSCORE (V)
Apache Redhawk In-rush Current Simulation (Conservative Programming)
IVSSCORE0 VSSCORE0
100000
Time (ps)
Fuse programmable in-rush control built-in Sample setting shows a 3A peak during CC6 exit 58 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
HW DATA SHOWING IN-RUSH EFFECT ON OTHER CORES
1 cores in/out CC6 3 core max freq
3 cores in/out CC6 1 core max freq
Large in-rush setting Transient droop limited by CC6 exit
1 cores in/out CC6 3 core max freq
3 cores in/out CC6 1 core max freq
Small in-rush setting Transient droop limited by # cores running max frequency
59 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
STATE SAVE AND RESTORE Regulate virtual supply during sleep mode to low voltage – Moderate leakage reduction possible – Globally applies to all retention elements in gated region – Example: K. Kumagai et al., "A Novel Powering-down Scheme for Low Vt CMOS Circuits," Symposium on VLSI Circuits, 1998 (Virtual Rail CMOS). L. Clark, S. Demmons, "Standby Power Management for a 0.18μm Microprocessor," ISLPED 2002 (Intel XScale processor).
Retention flops with always-on latch cell – Many variants possible – Some with explicit control or automatic restore – Good practice to avoid adding power domain crossing in functional path of flop – Can increase timing uncertainty with virtual supply bounce
– Reduces requirements of always-on supply distribution Write-out critical state to memory using serial or parallel paths – Can utilize scan mechanism to avoid a dedicate bus – Low area overhead
– Can be a large time overhead depending on total size of state restore memory
60 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CONCLUSION
61 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CONCLUSION DVFS provides a nice capability for optimization of CPU/GPU/APU performance per application
– As we add more cores and IP to the SOC, it’s will be difficult to continue to provide unique voltage rails with external regulators – Integrated regulation has it’s own challenges, more details will be covered by the other presenters Power gating can mitigate the need for additional voltage rails – Maximum frequency impact is minimal – Integration by embedding in IP or surrounding in a ring each have their own issues – Opportunity is limited by energy breakeven, in-rush noise, and state restore overhead – In-rush current control is necessary to prevent frequency impacts
62 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky