Design optimizations of a high performance microprocessor using ...

16.3

Design Optimizations of a High Performance Microprocessor Using Combinations of Dual-VTAllocation and Transistor Sizing James Tschanz, Yibin Ye, Liqiong Wei', Venkatesh Govindarajulu, Nitin Borkar, Steven Bums', Tanay Kamik, Shekhar Borkar and Vivek De Microprocessor Research, 'Mobile Architecture, 'Strategic CAD, Intel Labs, Hillsboro, OR, USA Abstract Joint ootimizations of dual-VT allocation and transistor sizing reduce'low-VT usage by 36%-45% and leakage power by 20% with minimal impact on total active power and die area. An enhancement of the optimum design allows processor frequency to he increased efficiently during manufacturing.

I. Introduction High performance microprocessor designs are driven to achieve maximum clock frequency without exceeding limits on leakage power during burn-in and total power during normal operation. I n addition, leakage-sensitive circuits such as domino logic gates must remain robust, and die area should he minimized. In this paper, we compare effectiveness of different dual-VT allocation and transistor sizing strategies for a high performance microprocessor in a 1.4V. 130nm dual-VT technology [I]. W e propose an enhancement to the optimum dual-VT design that allows processor frequency to he increased efficiently during manufacturing by pushing leakage of the low-VT transistor only.

U. Dual-VT Allocation a n d Sizing Optimizations Four different design optimizations (Fig. 1) are performed on 100 functional unit blocks of a high performance microprocessor for target clock- frequencies ranging from 1.96GHz to 2.3GHz: ( I ) dual-V, allocation without transistor resizing (DVT); (2) iterative dual-VT allocation and sizing for minimum total power (DVT+S); ( 3 ) sizing with all high-V,, followed by selective low-Vr insertion (H-SDVT); and (4) sizing with all IOW-VT,followed by selective high-Vr insertion (L-SDVT). Each circuit block contains 500-50000 transistors. An industry-standard sizing tool (AMPS) and an internally developed tool for transistor-level allocation of dual-V, (TADVT) are used for the optimizations. TA-DVT uses aprioritybased transistor selection algorithm that minimizes leakage power for a specific clock frequency target. Power estimations include both gate oxide leakage and subthreshold leakage, and their temperature dependences. Impacts of within-die channel length variations, stack effect and noise on leakage power are accounted for empirically. Selected transistors in leakagesensitive circuits are forced to be high-V, from noise immunity considerations. During each iteration of the DVT+S optimization for minimum power, transistor sizing is first done with all high-VT to meet a relaxed frequency specification, and then IOW-VTdevices are inserted to meet the actual performance target. As frequency specifications become more relaxed, smaller transistor sizes and more lowVT devices are needed to meet the actual target performance. Therefore, leakage power increases while switching power becomes smaller. The iteration that minimizes the sum total of switching and leakage powers is then selected as the optimum (Fig. 2). DVT+S optimization results are insensitive to the activity factor a in 0.01-OSrange (Fig. 3a), thus allowing uncertainty in a priori a-value estimations needed for the optimization. Furthermore, low-V, usaga and average device size vary widely across the circuit blocks (Fig. 3b), as dictated by their individual logic lopologies and timing requiremenls. UI. Power, Performance and Die Area Comparisons Both low-V, usage and transistor sizes increase at higher frequency targets (Fig. 4). However, leakage power remains approximately 14%-18% of total power across the entire frequency range. L-SDVT minimizes die area, H-SDVT

218

0.780~-73~o-3102/$17.00 02002 IEEE

minimizes leakage power, and DVT+S minimizes total power (Fig. 5). Although total power (for a of 0.01-0.1) and leakage power are approximately the same for all three techniques, HSDVT incurs 13%-15% larger die area than DVT+S and LSDVT in order to minimize low-VT usage. Compared to DVT, these techniques reduce low-VT usage from 22% to 8%-14%, and leakage power by 20%-23%. However, differences in total power and die area are not significant, except for HSDVT, which has 15% larger die area and minimum low-VT usage. Clearly, joint optimization of dual-VT allocation ,and sizing can reduce leakage power significantly, with minlmal impact on total power and die area. This is achieved by upsizing additional low-VT transistors to create more delay slacks, and then converting them to high-VT. Thus, more paths have delays close to the clock cycle time (Fig. 6). DVT offers 7 % frequency improvement over a single-VT design as both high-VT and IOW-VTdevice leakages are pushed to the same burn-in leakage power limit (Fig. 7a). DVT+S or L-SDVT improves frequency by additional 2% with virtually no impact on total power and die area. Frequency gains are similar for the same total active power as the best single-VT design (Fig. 7h).

lV. Enhancement of Optimum Dual-VT Design Typically, transistor leakage is pushed during manufactuTing to improve processor frequency continuously over time without requiring continuous design re-optimizations. Frequency of the optimum DVT+S design does not improve when only low-V, transistor leakage is pushed, since many paths close to the cycle time limit contain a significant number of high-VT transistors (Fig. 8). Therefore, both high-VI and low-VT device leakages must he pushed to improve frequency. However, robustness considerations limit the extent of highVT device leakage push. The design can he enhanced (EDVT+S) to allow frequency improvement (up to 20%, for example) through low-VT transistor leakage push only. This is accomplished by using TA-DVT to selectively insert additional IOW-VTdevices in paths close to the cycle time limit. Frequency improvement is then proportional to increase in drive current and leakage of the Iow-V, device only. Clearly, EDVT+S achieves 10%-20% frequency improvement with smaller leakage power and total power than DVT+S (Fig. 9), and is more robust as well. However, it incurs 1%-5'2 overhead in total power and 770-3390 overhead in leakage power at the original frequency and leakage targets (Fig. 10) in order to allow efficient frequency push at manufacturing. V. Conclusions Joint optimizations of dual-VT allocation and transistor sizing for a high performance microprocessor reduce low-VT usage by 36%-64%, compared to a design where only dual-v, allocation is optimized. Designs optimized for minimum power (DVT+S) and minimum area (L-SDVT) reduce leakage power by 20%, with minimal impact on total power and die area. An enhancement of the optimum DVTcS design allows processor frequency to be increased efficiently durin_e manufacturing through low-V, device leakage push only, without requiring design re-optimizations. Total power and leakage power overheads of the enhanced design at the original frequency and leakage targets are acceptably small. References [ I ] S . Tyagi er. al., 2000 IEDM, pp. 567-570. 2002 Symposium On VLSl

Circuits Digest of Technical Papers

Ongind design in 180nm m g l e ~ V Tkchnology, rrled 10 130nmlcchnology wiul dl high-V,

09 Find design in 130"

1

dual-VTlechnology

'H-L selective IOW-V, insertion L-H: selective high-qTinserbn

79

151

?2%

30%

L-SDVT

H.SDw Clock Fnquency Rrlnxslion U r d for Sizing Only

Figure 2 Dynamics of DVT+S optimization

Figure 1 Design optimization flows for different dual-V, and sizing options Difference in total power at

specific a vs. a: 0.1 0.26%

DVT+S 203 223 243 263 283 303 323 343 363 383 403 423 443 463 483

(a) (b) Figure 3 (a) Sensitivity of DVT+S optimization to a. (b) Variation in low-V, usage and average device width across 100 circuit blocks.

:*

gg

Path Delay ips)

Figure 6 Path delay distributions of dual-V, designs.

4

-E'

Pi

c-

ADVI

2

L

-

6 1

'9 1 L

ODVT+S

0 H.SDVT

0%

L-SDVT 0 ll0T

0 2 I.5

O

c.5 If

I

0%

-5%

OR

5Cr

10%

20%

15%

Die Area Change 0.95

I

1.05

1.1

1.15

1.2

Die Area (normalizsd)

2

ADYI

push

(b)

-"

OR

ODwiS

'' "'

Die Area Change

Is'

20g

Figure 7 Clock frequency, power and die area for different dual-V, and sizing design optimizations compared to the hest singleV, design at (a) equal burn-in leakage power, and (b) equal active power. m

111 2 1 I",

18, Yll 111 Y, 161 IS3 41'1 d l l U, 16/ 48,

m,

Path Delay (ps)

Figure 8 Sensitivity ofparh delay distribution and performance to low-V, leakage push. DVT+S: (a) before leakage push & (b) after leakage push. EDVT+S with 20% frequency push capability: (c) before leakage push & (d) after leakage push.

2.2

2.3

2.4 2.5 2.6 Clack FmqwmyiGIlzI

2.1

Figure 9 Bum-in leakage power and total power vs. clock frequency increase by leakage push for DVT+S and EDVT+S designs.

EDVT+S: 10% EDVT+S: 20%

5.2 %

33.6 %

Figure 10 Power overheads of EDVT+S designs at original leakage. 2002 Symposium On VLSl

Circuits Digest of Technical Papers

219

Design optimizations of a high performance microprocessor using ...

Design optimizations of a high performance microprocessor using ...

Suggest Documents

Design of High-Performance Microprocessor Circuits ... - Google Sites

Microprocessor Optimizations for the Internet of Things

Specialized Dynamic Optimizations for High-Performance Energy ...

Design of High-Performance Analog Circuits Using

Energy and Performance Improvements in Microprocessor Design ...

a small high performance microprocessor core sirius ... - Springer Link

RESYM. a high performance, low power multi-microprocessor bus J.D.

A High-Performance Self-Timed ARM Microprocessor - CiteSeerX

Performance-based seismic design of bridges using high performance ...

High Performance VLSI Design Using Body

Performance Study of a Multithreaded Superscalar Microprocessor ...

EMBEDDED MICROPROCESSOR PERFORMANCE EVALUATION ...

Design and Verification of a High Performance

Design of a Low Power, High Performance

Microprocessor Design - KTH

Just-In-Time Optimizations for High-Performance Java ... - CiteSeerX

Design of High-Performance Ships using Simulations - Ship Science ...

Design of High Performance Airfoil Using Micro Size

A Fine Grain Microprocessor Design Education

[PDF] Microprocessor Design - Google Sites

Classification of Compiler Optimizations for High ... - CiteSeerX

Design of a Microprocessor-based Control System of a ... - CiteSeerX

Design of a Microprocessor-based Control System of a ... - CiteSeerX

High-Performance Biogas Upgrading Using a