NEM Relays - Semantic Scholar

1 downloads 0 Views 1MB Size Report
Nov 12, 2008 - NEM Relays. Fred Chen2, Hei Kam1, Dejan Markovic3,. Tsu-Jae King Liu1, Vladimir Stojanovic2, Elad Alon1. 1Department of EECS, University ...
Integrated Circuit Design with NEM Relays Fred Chen2, Hei Kam1, Dejan Markovic3, Tsu--Jae King Liu1, Vladimir Stojanovic2, Elad Alon1 Tsu 1Department

of EECS, University of California, Berkeley, CA 2Department of EECS, Massachusetts Institute of Technology, Cambridge, MA 3EE Department, University of California, Los Angeles, CA

CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000

1000



10.1

100

Sun’s Surface Rocket Nozzle Nuclear Reactor

Normalized Energy/op

Power Density (W/cm2)

10000

100

8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar

80

60

40

20

0 0 10

1

10

2

3

10 10 1/throughput (ps/op)

4

10

Vdd and Vt not scaling well  power/area not scaling

ICCAD 2008 – November 12, 2008

2

CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000

1000

 

10.1

100

Sun’s Surface Rocket Nozzle Nuclear Reactor

Normalized Energy/op

Power Density (W/cm2)

10000

100

Core 2 8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar

80

60

Operate at a lower energy point

40

20

0 0 10

Run in parallel to recoup performance 1

10

2

3

10 10 1/throughput (ps/op)

4

10

Vdd and Vt not scaling well  power/area not scaling Parallelism to improve performance within power budget ICCAD 2008 – November 12, 2008

3

Where Parallelism Doesn’t Help 25

100

20

80

15 Etotal 10 Edynamic

5

Eleak

0.1



0.2

0.3 Vdd (V)

0.4

0.5

60

Scale Vdd & VT: 40

20

0 1 10

2

10

3

4

10 10 1/throughput (ps/op)

5

10

CMOS circuits have wellwell-defined minimum energy   

10.1

Energy/op vs. 1/throughput

Normalized Energy/op

Normalized Energy/cycle

Energy/op vs. Vdd

Caused by leakage and finite sub-threshold swing Need to balance leakage and active energy Limits energy-efficiency, regardless how slow the circuit runs ICCAD 2008 – November 12, 2008

4

What if There Was No Leakage? Normalized Energy/cycle in CMOS

Measured relay I-V 1.2

W x L = 2mm x 20mm

1.0

20

15

IDS (mA)

Normalized Energy/cycle

25

Etotal

10

0.6

VDS = 1V

0.0 34

0.1





10.1

0.2

0.3 Vdd (V)

0.4

compliance limit

0.4

0.2

Edynamic

5

H = g = 0.6mm

0.8

0.5

35

36

VGB (V)

37

No longer need to balance increasing component of energy as Vdd decreases Relays offer near infinite subsub-threshold slope and no leakage current ICCAD 2008 – November 12, 2008

5

Outline 



If we could reliably build relays, would circuits implemented using relays offer superior energy--efficiency to CMOS? energy Compact Circuit Model for NEM Relays 



Circuit Design Paradigms for NEM Relays 



How to deal with ―slow‖ relays

Compare vs. CMOS  

10.1

Fast yet key functionality captured

Digital Logic: 32-bit adder Mixed Signal I/O: DAC, ADC ICCAD 2008 – November 12, 2008

6

NEM Relay Structure & Operation Schematic Cross-Section Electrical Gate

Metal Channel

Gate Dielectric

W HH Source

t gap

Closed Relay (“ON”): |Vgb| > Vpi (pull-in voltage)

Air gapH

Drain Drain

tdimple

Body Dielectric Bodyody Body 

Plan-View

L Source

Open Relay (“OFF”): |Vgb| < Vpo (pull-out voltage)

Anchor

Electrical Gate

Body Electrode (under body dielectric)

W

Drain

Channel 

Contact Dimples 10.1

Relatively low on-state resistance (100Ω < Ron < 1kΩ @ 90nm)

Infinite off-state resistance  Zero off-state leakage

ICCAD 2008 – November 12, 2008

7

NEM Relay Model Anchor

Mechanical Model Spring k

Mass m

Damper b C

0.5Rc Cs

S Rsd

0.5Rc Cd

Go Cgc

Rcs So

Cgb

D

Rcd

Rsd

Do Csd_b

Csd_b B 

Compact Verilog Verilog--A model for circuit simulation:  

 10.1

Mechanical dynamics: spring (k), damper (b), mass (m) Electrical parasitics: non-linear gate-body (Cgb), gate-channel (Cgc), and source/drain-body cap (Csd_b), contact resistance (Rcs,d)

Vpi and delay verified against device measurements ICCAD 2008 – November 12, 2008

8

NEM Relay Scaling 

Constant EE-field scaling has similar benefits as CMOS Pull-in Voltage with Beam Scaling



Measured pullpull-in voltages scale linearly If we can overcome reliability issues & surface forces scale: {W,L,tgap} = {90nm,2.3um,10nm}  Vpi = 200mV 



10.1

Mechanical delay also scales linearly (~10ns @90nm) ICCAD 2008 – November 12, 2008

9

NEM Relay as a Logic Element



4-terminal design mimics MOSFET operation  Electrostatic actuation is ambipolar  Non Non--inverting logic possible: actuation independent of drain/source voltages



Mechanical delay (~10ns) >> electrical t (~1ps)

 Can stack 200 devices (with 1kW on-resistance) before electrical delay is comparable to mechanical delay 10.1

ICCAD 2008 – November 12, 2008

10

Digital Circuit Design with Relays



CMOS: delay set by electrical time constant  



Relays: delay dominated by mechanical movement  

10.1

Quadratic delay penalty for stacking devices (pass transistor logic) Buffer & distribute logical/electrical effort over many stages Want all relays to switch simultaneously Implement logic as a single complex gate ICCAD 2008 – November 12, 2008

11

Digital Circuit Design with Relays

4 gate delays 

Delay Comparison vs. CMOS  



Single mechanical delay vs. several electrical gate delays For reasonable loads, relay delay unaffected by fan-out/fan-in

Area Comparison vs. CMOS  

10.1

1 mechanical delay

Larger individual devices Fewer devices needed to implement the same logic function ICCAD 2008 – November 12, 2008

12

CMOS vs. Relays: Digital Logic Normalized Energy/op

100

80

60

40

20

0 1 10



2

10

3

4

10 10 1/throughput (ps/op)

5

10

Most processor components exhibit the energy vs. performance tradeoff of static CMOS  Control, Datapath Datapath,, Clock



10.1

Adder energy performance tradeoff is representative ICCAD 2008 – November 12, 2008

13

Relay--based Adder Relay 

Full adder cell: 

 



10.1

12 relays vs. 24 transistors XOR for free Use fully complementary signals to avoid additional mechanical delay (to invert)

Relays all sized minimally

ICCAD 2008 – November 12, 2008

14

N-bit RelayRelay-based Adder 





10.1

Ripple carry configuration Cascade full adder cells to create larger complex gate Stack of N relays— relays— still 1 mechanical delay

ICCAD 2008 – November 12, 2008

15

Snapshot: NEM Relays vs. CMOS Energy/op vs. Delay/op across Vdd

9x 10x

32-bit adder

CMOS

Relay

Supply Voltage

0.5 V

0.32 - 0.9 V

Load Cap per Output

25 fF

25 fF

Total Gate Cap

4.0 pF

125 fF

600 mm2

480 mm2

Area

Energy is for adder only

 

Compare vs. Sklansky CMOS adder1 For similar area:  >9x lower energy/op, >10x greater delay

1D.

10.1

Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07).

ICCAD 2008 – November 12, 2008

16

Energy--Delay Results Energy 

Relays always provide a more energy efficient solution 

> 10x better, even in the GOPS range Energy/op vs. Delay/op across Vdd & CL



30x capacitance gain  



2.4x Vdd gain 

10.1

ICCAD 2008 – November 12, 2008

Lower device Cg, Cd Fewer devices No leakage energy

17

Area vs. Throughput Tradeoff 30 W Relay (buffered, 25fF)

A

Relay

/A

CMOS

25 20

W Relay (buffered, 100fF)

15

Au Relay (25fF) 10 Au Relay (100fF) 5

Erelay = 20fJ/op 0.5





2.5

3

For high throughputs, relays require area overhead to achieve similar throughputs as CMOS Area overhead is bounded (max. 6x6x-25x) : 

10.1

1 1.5 2 Throughput (GOPS)

To maintain minimum energy/op for throughputs > 1GOPS, CMOS needs parallelism too ICCAD 2008 – November 12, 2008

18

CMOS vs. Relays: Mixed Signal



I/O energy dominant if core is ~30x more efficient  See if I/O circuits can benefit as well  Representative examples: DAC & ADC



How to process/generate analog signals?  No ―linear gain‖ in relays – rely on switching

10.1

ICCAD 2008 – November 12, 2008

19

NEM Relay Based DAC DAC Architecture

Voltage Output

AC Impedance

EBIT  

 

Need to add passive R to set impedance Don’t have to buffer up or size relays to drive output Relays’ gate voltage independent of I/O voltage

Energy dominated by I/O energy: 

10.1

2

VIO2 CL

Same topology as CMOS except: 





@VIO=200mV, CL=1pF: 62.8fJ/bit vs. ~780fJ/bit for CMOS ICCAD 2008 – November 12, 2008

20

NEM Relay Based Flash ADC



Topologically similar to CMOS  



Key differences… 



10.1

Sample/Hold input Flash Converter – bank of comparators

Sample & Hold: Unbalanced ―on‖ vs. ―off‖ switching times Comparators: Body terminal used as threshold, ambipolar actuation, hysteretic switching

ICCAD 2008 – November 12, 2008

21

Relay Based Comparators 

Body terminal used to set threshold

VT



Hysteretic switching impact 



10.1

Comparator will have different threshold if previous state varies Need to reset all comparators to the same state before each evaluation  dynamic logic ICCAD 2008 – November 12, 2008

22

Relay Based Comparators 60 50

Output Code

Max 40 30

VT

Min

20 10 0



-0.2

0.8

1

Comparators above & below input will evaluate  need a different decoder

Each comparator has a ―dead―dead-zone‖ where it stays off 

10.1

0.2 0.4 0.6 Input Voltage (V)

Ambipolar actuation impact 



0

Can use ―Max‖ or ―Min‖ of this range to determine code ICCAD 2008 – November 12, 2008

23

ADC Results Vcore = 0.3V C = 500fF R = 4kW VREF = 1V fsamp = 10 MS/s 6-bit res: 5.5fJ/conv



Energy dominated by resistor string (320fJ out of 350fJ)  



10.1

Largely dependent on input dynamic range (VREF), not core voltage FOM improves with more resolution as resistor string still dominates

Most advantages stem from lower Ron at smaller Cg’s ICCAD 2008 – November 12, 2008

24

Conclusions 

NEM relays offer unique characteristics 

Superior Ion/Ioff ratio vs. CMOS Superior Ron*Cg vs. CMOS



Switching delay largely independent of electrical t





Relay circuits show order of magnitude energy efficiency improvements over CMOS 





10.1

Use parallelism (and area) to achieve comparable throughputs in digital circuits Mixed signal circuits see similar benefits from low Ron at small Cgs

Key challenge is scaling devices reliably

ICCAD 2008 – November 12, 2008

25

Acknowledgements     

10.1

Berkeley Wireless Research Center NSF DARPA FCRP (C2S2) Intel Foundation Graduate Fellowship

ICCAD 2008 – November 12, 2008

26