Practical power gating and dynamic voltage/frequency scaling issues

69 downloads 1380 Views 2MB Size Report
2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky. OUTLINE. ▫ Motivation for DVFS and Power ...
PRACTICAL POWER GATING AND DYNAMIC VOLTAGE/FREQUENCY SCALING

Stephen Kosonocky

AMD Fellow

1 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

OUTLINE  Motivation for DVFS and Power Gating  Dynamic Voltage/Frequency Scaling  Power Gating  Conclusion

2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

MOTIVATION

3 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

SERVER COST TRENDS

 Server compute capacity/$ is increasing steadily – Driving up total power/$ spent on servers – Also driving up electrical power spending  Green initiative and focus on energy independence creates additional pressure to lower energy costs  Increased energy efficiency directly reduces costs

• Source Data: Uptime Institute, 2009

4 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

MOBILE AND DESKTOP Now: Parallel/Data-Dense

 Diverse workloads – Data serial, data parallel

16:9 @ 7 megapixels HD video flipcams, phones, webcams (1GB) 3D Internet apps and HD video online, social networking w/HD files

– Streaming

3D Blu-ray HD

– Compute intensive in bursts

Multi-touch, facial/gesture/voice recognition + mouse & keyboard

– Long periods of inactivity

All day computing (8+ Hours)

 Small form factor  Long battery life

Immersive and interactive performance

 Instant access of the device

 Autonomous power management - Samuel Naffziger, VLSI Symposium 2011, Kyoto

5 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Workloads

 High degree of integration of special functions

POWER/PERFORMANCE TRADE-OFF

Processor Power-Frequency/Voltage 35

Perf/Watt

30

Watts/Core

25

20

10

Power

Performance/Watt maximized at the lowest voltage operating point

8

6

15

4 10

Fmax/Voltage

5

0

2

0 0.725 0.775 0.825 0.875 0.925 0.95

1.025 1.075 1.125 1.175 1.225

Voltage Performance/Watt maximized at VMIN 6 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Frequency (GHz)

12

HIGH PERFORMANCE CORE POWER

Depending on process node, leakage ~ 40% total power 7 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Two major knobs have emerged for controlling power 1. Dynamic Voltage and Frequency Scaling – Optimize performance for the application while it’s running 2. Power Gating – Gate power during idle periods Each present unique challenges for

implementation and optimization

8 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

DYNAMIC VOLTAGE/FREQUENCY SCALING

9 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

LLANO ACCELERATED PROCESSING UNIT (APU) • Integration Provides Improvement • Eliminate power and latency of extra chip crossing • 3X bandwidth between GPU and Memory • Same sized GPU is substantially more effective • Power efficient, advanced technology for both CPU and GPU • Thermal management optimization between CPU and GPU  Key features for Mobile Mark 07 – Lower Idle, Active power – DVFS / Clock-gating / Power Gating – Robust and Flexible Power Management Control

~ Power Breakdown Analog

IO PHY CPU

NB

10 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

DDR PHY

GPU

APU POWER MANAGEMENT

- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 11 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

DYNAMIC VOLTAGE SCALING

 Dynamic frequency and voltage scaling (DVFS)

Voltage

P-states: Core executing at some frequency

 Match the frequency of operation with the workload

VP-states

C1/C2: Core halted and frequency ramped down

VAltvid

C3/C4: Core halted, clocks ramped to even lower frequency, maintain caches, maintain core

 Lower voltage to sustain required frequency

Examples:  P-states for active control of voltage/frequency

Vdeeper Altvid

C5: Core halted, clocks ramped to even lower frequency, flush caches, maintain core

 C-states for degrees of idle control 0V

C6: Core off, flush caches, core state saved to DRAM

12 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

DESIGN FOR DVFS  Optimization of timing across entire voltage range necessary for efficiency DVFS  Required two separate timing corners with equally aggressive cycle time goals – 0.8V & 1.3V  Both gate-dominated timing paths and paths with high RC content have been tuned to provide better scaling than the prior design  Enables the core to thoroughly exploit boost capability

13 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

APU DVFS

- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 14 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

APU THERMAL PROFILE

110.0

105.0

100.0

95.0

90.0

85.0

CPU-centric workload

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

15 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

APU THERMAL PROFILE

110.0

105.0

100.0

95.0

90.0

85.0

GPU-centric data parallel workload

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

16 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

APU THERMAL PROFILE

110.0

105.0

100.0

95.0

90.0

85.0

Balanced workload

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

17 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

WHAT HAPPENS WHEN WE ADD MORE CORES?

CPU 0

Already at 6, 2 high current supplies

CPU 1

PLL

PLL

VDD

CPU 2

CPU 3

PLL

VRM

DIMMs SVI

PLL

VDDIO, VT T PLLs

Misc

VLDT

Graphics Northbridge

PLL

VDDN B 18 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

PLLs

VDDA

PACKAGE LIMITS  Packages add constraints to increasing number of discrete supplies

Typical low cost package structure

 Typically only use 4 thick layers for high current power distribution  3-4 thin build-up layers on each side – Used for secondary supply and signal routing to pins  Difficult to add many more supply rails

19 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Chip

VDD

VDDR

VDDNB

VDDA

TYPICAL PACKAGE LAYER ROUTING LAYERS

VDDIO

M1 – PWR (top metal layer) Package capacitors

M2 – routing layer 1 Routing congestion

Space limited for capacitors 20 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

CORE VOLTAGE POWER DELIVERY IN PACKAGE

VSS Max Package Droop

VDD Max Package Droop 0.041 0.04

2.4x higher

-0.025 -0.0255 -0.026

0.039 0.038

-0.0265

0.037 -0.027

0.036 0.035

1

-0.0275

0.034

S29

-0.0285

S22 S15

31

S1

34

25

28

19

22

S8

16

13

-0.029

7

S22 S25

S16 S19

S13

S7

S4

S1

31

S10

25

10

0.032

19

-0.028

1

0.033

13

4

7

 Package Routing Constraints forces compromises for VDD plane – More difficult with increasing VDD planes  More resources dedicated to VSS – Shows less droop

21 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

WHAT DOES THE ACTUAL VDD LOOK LIKE? 25ns trace of DGEMM benchmark 1.35

1.3

1.2

~50mV 1

VDD

1.25 0.8 1.2

0.6 0.4

1.15

0.2 1.1

0

Time X86 core Supply simulation w/ package board model Increased local decoupling cap could help 22 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Normalized Current

1.4

PACKAGE CAPS FOR SUPPLY DECOUPLING

Most effective

Marginally effective

Hard to find places for additional components 23 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

LARGE ON-DIE DECOUPLING CAPACITOR FOR REDUCTION OF SUPPLY NOISE Trench Capacitor Shift resonant frequency from 80MHz -> 7MHz

- D. Wendel, et.al. “The Implementation of POWER7: A Highly Parallel and Scalable Multi-Core High-End Server Processor”, ISSCC 2010

100fF/um2 ~ 25x thick-ox

Adds significant process cost

24 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

METHODS FOR INTEGRATED VOLTAGE REGULATION

 Switched capacitor regulator  Buck converter

 Linear regulator  Switched voltages

25 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

INTEGRATED SWITCHED CAPACITOR REGULATOR  Efficiency dependent on load current

2:1 Conversion

– Most efficient at 2:1 voltage conversion ratio – Limits number of switches in the path – ~90% for 2:1, ~70% for other values  Requires large capacitors – Can use trench capacitor for better area efficiency

 Large voltage and current ripple – Can mitigate with large decap and/or interleaved converters

Leland Chang, Robert K. Montoye, Brian L. Ji, Alan J. Weger, Kevin G. Stawiasz, and Robert H. Dennard, “A Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm2”, VLSI Symposium , 2010

26 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

INTEGRATED BUCK CONVERTER  Integrated inductor options  ~ 0.33nH possible

 High frequency operation can reduce inductance requirements  L proportional 1/Fs  Can reasonably approach 1GHz when fully integrated

 Parasitic series resistance is key problem  I2R loss large  Multi-phase or parallel regulator architecture can help – (I/n) 2 R

(n=number phases)

Interleaved Outputs

– Current ripple is averaged by output cap  Too many phases will reduce efficiency for low current loads

27 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

W. Kim, et.al, June, 2007

INTEGRATED BUCK CONVERTER KEY LIMITATION: DCR VS. INDUCTANCE

• Embedded inductors needs lower resistance or higher inductance/turn • Multi-phase or parallel can approach reasonable efficiency only with very high number of paths • Reduces current in each inductor

CMOS Inductor vs. Small Package size components

• Fully embedded approach really needs a specialized integrated inductor technology Peak Efficiency considering inductor resistive loss alone

>70%

Discrete inductors consume valuable package resources 28 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

INTEGRATED LINEAR REGULATION

 Error between divided output and reference voltages is and fed to the pass transistor to correct the error  NMOS type higher performance  PMOS type (Low Drop Out)

 Efficiency good only for small (Vin – Vout) Eff

Vin Vref Vout ErrAmp

= (Power Out)/(Power In) = ~ Vout/Vin

 LDO design can be more efficient (PMOS output) – Stability & speed of feedback become an issue

29 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Iload

LINEAR REGULATOR (PMOS)

PMOS based linear regulators offer low dropout voltage while operating within (0-Vin) range

Vin

Vref

Required gain-bandwidth product challenging high for processor loads

+ Vout Iload

Reff



Cdecap



BW ~ 1GHz needed Hurts efficiency, power wasted in Op-amp

30 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

LINEAR REGULATOR (NMOS) Charge Pump

Vref

Vin

Source-follower topology 

+ -

 

Will require charge pump to provide gate overdrive. Accurate voltage setting. Supply rejection – dropout voltage tradeoff

Vout Iload

Reff

Cdecap

High BW response makes it a good candidate for integration w/o large Capacitors Intrinsically good frequency response

Charge-pump (or low current supply) allows Low drop out

Requires charge pump or high voltage supply >Vin for good efficiency

Higher Efficiency  



Amplifier is high gain, low bandwidth. Bandwidth and efficiency depends on desired slew rate 0Vth devices offer additional efficiency

31 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

DVFS SYSTEM IMPLEMENTED WITH DUAL VOLTAGES  Fully integrated167 processor system  Each processor tile contains a core that operates at: – A fully-independent clock frequency  Any frequency below maximum  Halts, restarts, and changes arbitrarily

– Dynamically-changeable supply voltage  VddHigh or VddLow  Disconnected for leakage reduction

Dean N. Truong, Wayne H. Cheng, Tinoosh Mohsenin, Zhiyi Yu, Anthony T. Jacobson, Gouri Landge, Michael J. Meeuwsen, Christine Watnik, Anh T. Tran, Zhibin Xiao, Eric W. Work, Jeremy W. Webb, Paul V. Mejia, Bevan M. Baas, “167-Processor Computational Platform in 65 nm CMOS”, JSSC, Vol. 44, No. 4, April 2009

 Each power gate comprises 48 individually-controllable parallel transistors

32 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

POWER GATING

33 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

WHAT'S THE PROBLEM? JUST TURN IT OFF WHEN YOUR NOT USING IT!

 Many choices and options  Need to analyze your particular application for the optimal solution  Considerations: – – – – – – –

Design complexity Power savings Frequency/performance impact Wake-up time Area overhead Verification complexity Return on investment (ROI)

34 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

POWER GATING MODES (ACTIVE)

During active mode  Natural low gate activity factors inside of logic block allow sharing of power gate device for many logic functions with gated logic block – Reduces resistance requirements of power gate device

RON

IMAX

– Works to improve Ion/Ioff ratio requirements

 Design objective – Minimize resistance of cut-off device

 Penalties:

Logic Block

– Increased logic delay – Requires higher external supply voltage for same frequency  Consumes more power during active mode

VLOGIC  Vdd  VIRPG VIRPG  RON  I MAX

Header Switch

35 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

POWER GATING MODES (SLEEP)

During sleep mode  Steady state – Want to maximize OFF resistance of cut-off device – Reduce stand-by power with lower leakage

ROFF

ISLEEP

(10x -> 1000x possible)

 Transition to cutoff

Logic Block

– Creates supply di/dt for small ISLEEP vs. large IL

PSLEEP  Vdd  I SLEEP

dI CUTOFF I SLEEP  I L   dt tCUTOFF

Header Switch

36 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

POWER GATING MODES (WAKE)

During wake-up mode – Need to control in-rush current to charge logic devices to VDD

RWAKE

 Logic capacitance is discharged to nVdd during sleep mode

IWAKE

IWAKE

CLogic

 1  n Vdd  RWAKE

Header Switch

1st order analysis

37 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

LEAKAGE SAVINGS WITH POWER GATING

>20x leakage savings

Early Llano Measured Results Very effective method for controlling power during idle periods Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI CMOS”, JSSC, January, 2011 38 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

DETERMINATION OF FREQUENCY IMPACT DURING ACTIVE MODE  Determine tolerable voltage drop for active mode operation – De-rate timing models by fixed maximum value to account for voltage loss in power switch – Increase external voltage to compensate for voltage drop

 Use critical path simulations or representative RO data for voltage/frequency trade-off  Design worst-case power gate and grid resistance to meet specification target  Analyze drop at highest operating voltage and highest power consumption  Keep voltage drop due to power gating small to minimize uncertainty at power island boundary crossings  2nd order effects: – Circuit area growth due to wire blockages caused by power switch – Decreased available quiet decoupling capacitance due to increased resistance to implicit and explicit charge reservoirs on the die

39 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

FREQUENCY IMPACT ESTIMATED WITH RING OSCILLATOR 32 nm FO1 RO Data (RVT, min W) 3.0%

2.5%

Frequency Delta

1% VDD loss ~ 1% frequency loss 2.0%

0.5% Delta VDD 1.0% Delta VDD 1.5% Delta VDD

1.5%

1.0%

Target IR drop: VIRPG~11mV @1.1V

0.5%

0.0%

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

VDD (V)

40 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

1.4

POWER GATE CELL I-V CHARACTERISTICS

76.50

7.0E-04

6.0E-04

FET operates in the linear region

76.00

RDS 75.50

4.0E-04

3.0E-04

Can be approximated as a resistor

2.0E-04

75.00

74.50 1.0E-04

Target region of operation

0.0E+00

0.00

74.00

0.01

0.02

0.03

0.04

VDS (V)

41 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

0.05

0.06

RDS (ohms)

IDS (A)

5.0E-04

IDS

EMBEDDED ROW-BASED POWER SWITCH PLACEMENT VirVDD VDD

PS PS PS PS PS PS PS PS PS PS PS

VSS

Current Flow

VirVDD VDD

PS PS PS PS PS PS PS PS PS PS PS

VSS

Pre-populate logic area with power gate cells at uniform locations prior to place and route 42 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

MAXIMUM POWER GATE SPACING (ROW-BASED PLACEMENT)

Imax

VIRW(0)

VIRW(1)

VIRW(2)

VIRW(3)

VIRW(4)

VIRPG

 Assume uniform current density  Can estimate power gate spacing with respect to IR and EM limits

VIRPG

= 2 * rows * IROW * RPG

VIR_WC = VIRPG + ∑ VIRW

43 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

IR DROP EXAMPLE (UNIFORM POWER DENSITY) IR Drop Along Virtual VSS Perpendicular to Standard Cell Rows 70.0 60.0 1X Metal Virtual Supply IR drop 1.3X Metal Virtual Supply IR drop

IR Drop (mV)

50.0 40.0 30.0 20.0

Max current density = 1.0A/mm2 75.57ohm footer

10.0 30rows

0.0 0

20

40

60

80

100

Standard Cell Rows Between Footers 44 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

120

140

EM EXAMPLE (UNIFORM POWER DENSITY) EM Margin Along Virtual VSS Perpendicular to Standard Cell Rows 25%

EM Limit

20%

1X Metal EM Limit 1.3X Metal EM Limit

Max current density =1.0A/mm2

15%

10%

5% 30rows

Thick metal will provide more EM margin

0% 0

20

40

60

80

100

Standard Cell Rows Between Footers 45 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

120

140

VERTICAL CROSS SECTION C4 Bump M11

Full VDD Grid

M10

thicker metal

M9 M8 M7

VDD Via Tower

Virtual VDD Grid

M6 M5 M4 M3

EM

M2 VDD

PS

VVDD

Logic

EM hazards can occur in high power density regions 46 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

EMBEDDED POWER GATE EXAMPLE STYLE USED IN LLANO GPU AMD GPU Functional Unit Power Gate Tile Power gate: Checkerboard pattern

SRAM Always-on cells: FeedThru cells, Power control logic, Clamp cells for macro input pins and tile output ports. Input I/O buffers

Power gate cells must squeeze around arrays Providing power to always-on cells can be problematic IP I/O protection typically needed to isolate individual regions SRAMs need their own custom power gating 47 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

EMBEDDED POWER GATE EXAMPLE AMD GPU Functional Unit Power Gate Tile

Simulated with Apache Redhawk in vectorless dynamic analysis mode

Power gating increases distribution of IR drop on grid Can cause problems with critical circuits and paths 48 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

EXTERNAL RING STYLE GATING

 Interrupt global supply grid to power-gated IP block  Requires careful chip floorplanning to accommodate supply interruption

 Increased sharing of power gate devices can ease power gate sizing – Large IP power gating can greatly ease worst-case current analysis

Global Grid

GND VGND

Macro/Core M1 metal

Virtual Grid

 Can be difficult to create always power-on islands with power-gated region

49 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

M2 metal

Footer Switch Location

LLANO CPU CORE RING GATING WITH PACKAGE LAYER ASSIST

Virtual voltage spreads uniformly Bumps near hot spots can exceed max limits

Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI CMOS”, JSSC, January, 2011

High RVSS can create noise issues

50 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

STRAIGHT EDGE & ZIG/ZAG EDGE PACKAGE PLANE BOUNDARIES

Peak currents flowing into bumps at periphery can be alleviated using zig/zag approach 51 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

MAXIMIZING POWER GATING OPPORTUNITIES  To maximize power savings, we would like to go into a power gating mode as soon as possible to reduce leakage power  Practical constraints limit opportunity – Energy break-even point – In-rush current induced noise – Time to restore state of power gated region

Power

Application Mode

A c t i v e

Idle

A c t i v e

Idle

A c t i v e

Idle

time

52 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

ENERGY BREAK-EVEN POINT

Power Gate Example: T0:

Logic block starts inactivity

T1:

Control circuit decides to power gate

T2:

Power gate signal propagation complete

T3:

Energy break-even point

T4:

Full discharge of virtual supply

T5:

Control circuit decides to repower logic

T6:

Power gate signal propagation complete

T7:

Gated logic is fully charged and ready for activity

Z. Hu et al., "Microarchitectural Techniques for Power Gating of Execution Units," ISLPED'04, August 9–11, 2004, Newport Beach, California, USA.

53 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

IN-RUSH CURRENT CONTROL  During wake-up of the power gated region it’s critical to control the inrush current

 The effective resistance of parallel combination of footers is very small – Can create excessive currents

 Transition to cutoff mode can also create similar di/dt events  Hazards – Supply bounce from large di/dt – EM violations – Short circuit currents from imbalanced nodes in power gated region during wake-up

S. Kosonocky

54 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

54

UNDERSTANDING SUPPLY BOUNCE

In-rush current can cause supply bounce on neighboring IP

Suhwan Kim, Chang Jun Choi, Deog-Kyoon Jeong, Stephen Kosonocky, Sung Bae Park, “Reducing Ground-Bounce Noise and Stabilizing the Data-Retention Voltage of Power-Gating Structures”, IEEE Transactions on Electron Devices, Vol. 55, NO. 1, January 2008

55 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

55

IN-RUSH CURRENT SIDE-EFFECTS

 Disturb neighboring circuits – Retention flops, adjacent IP logic, SRAM  Supply bounce, ground bounce

 Over stress gate-oxides due to voltage excursions  Un-even distribution of wake signals during power-up – Creates excessive short circuit currents when some gates are charged and others are not

 Large di/dt can cause EM failures

S. Kosonocky

56 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

56

PROGRAMMABLE IN-RUSH CURRENT CONTROL  When exiting power gating, power gates are gradually enabled in groups with progressively larger effective device widths – Controlled by FSM – Fuse programmable strengths and duration for post-silicon tuning – Analytical R-C model w/o inductance demonstrating control

group1 group2 group3

group4 group5

57 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

LLANO CPU CORE IN-RUSH CURRENT ON CC6 EXIT

Current (A)

3.50

1.4

3.00

1.2

2.50

1

2.00

0.8

1.50

0.6

1.00

0.4

0.50

0.2

0.00

0 0

20000

40000

60000

80000

VSSCORE (V)

Apache Redhawk In-rush Current Simulation (Conservative Programming)

IVSSCORE0 VSSCORE0

100000

Time (ps)

Fuse programmable in-rush control built-in Sample setting shows a 3A peak during CC6 exit 58 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

HW DATA SHOWING IN-RUSH EFFECT ON OTHER CORES

1 cores in/out CC6 3 core max freq

3 cores in/out CC6 1 core max freq

Large in-rush setting Transient droop limited by CC6 exit

1 cores in/out CC6 3 core max freq

3 cores in/out CC6 1 core max freq

Small in-rush setting Transient droop limited by # cores running max frequency

59 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

STATE SAVE AND RESTORE  Regulate virtual supply during sleep mode to low voltage – Moderate leakage reduction possible – Globally applies to all retention elements in gated region – Example:  K. Kumagai et al., "A Novel Powering-down Scheme for Low Vt CMOS Circuits," Symposium on VLSI Circuits, 1998 (Virtual Rail CMOS).  L. Clark, S. Demmons, "Standby Power Management for a 0.18μm Microprocessor," ISLPED 2002 (Intel XScale processor).

 Retention flops with always-on latch cell – Many variants possible – Some with explicit control or automatic restore – Good practice to avoid adding power domain crossing in functional path of flop – Can increase timing uncertainty with virtual supply bounce

– Reduces requirements of always-on supply distribution  Write-out critical state to memory using serial or parallel paths – Can utilize scan mechanism to avoid a dedicate bus – Low area overhead

– Can be a large time overhead depending on total size of state restore memory

60 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

CONCLUSION

61 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

CONCLUSION  DVFS provides a nice capability for optimization of CPU/GPU/APU performance per application

– As we add more cores and IP to the SOC, it’s will be difficult to continue to provide unique voltage rails with external regulators – Integrated regulation has it’s own challenges, more details will be covered by the other presenters  Power gating can mitigate the need for additional voltage rails – Maximum frequency impact is minimal – Integration by embedding in IP or surrounding in a ring each have their own issues – Opportunity is limited by energy breakeven, in-rush noise, and state restore overhead – In-rush current control is necessary to prevent frequency impacts

62 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Suggest Documents