Nov 12, 2008 - NEM Relays. Fred Chen2, Hei Kam1, Dejan Markovic3,. Tsu-Jae King Liu1, Vladimir Stojanovic2, Elad Alon1. 1Department of EECS, University ...
Integrated Circuit Design with NEM Relays Fred Chen2, Hei Kam1, Dejan Markovic3, Tsu--Jae King Liu1, Vladimir Stojanovic2, Elad Alon1 Tsu 1Department
of EECS, University of California, Berkeley, CA 2Department of EECS, Massachusetts Institute of Technology, Cambridge, MA 3EE Department, University of California, Los Angeles, CA
CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000
1000
10.1
100
Sun’s Surface Rocket Nozzle Nuclear Reactor
Normalized Energy/op
Power Density (W/cm2)
10000
100
8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar
80
60
40
20
0 0 10
1
10
2
3
10 10 1/throughput (ps/op)
4
10
Vdd and Vt not scaling well power/area not scaling
ICCAD 2008 – November 12, 2008
2
CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000
1000
10.1
100
Sun’s Surface Rocket Nozzle Nuclear Reactor
Normalized Energy/op
Power Density (W/cm2)
10000
100
Core 2 8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar
80
60
Operate at a lower energy point
40
20
0 0 10
Run in parallel to recoup performance 1
10
2
3
10 10 1/throughput (ps/op)
4
10
Vdd and Vt not scaling well power/area not scaling Parallelism to improve performance within power budget ICCAD 2008 – November 12, 2008
3
Where Parallelism Doesn’t Help 25
100
20
80
15 Etotal 10 Edynamic
5
Eleak
0.1
0.2
0.3 Vdd (V)
0.4
0.5
60
Scale Vdd & VT: 40
20
0 1 10
2
10
3
4
10 10 1/throughput (ps/op)
5
10
CMOS circuits have wellwell-defined minimum energy
10.1
Energy/op vs. 1/throughput
Normalized Energy/op
Normalized Energy/cycle
Energy/op vs. Vdd
Caused by leakage and finite sub-threshold swing Need to balance leakage and active energy Limits energy-efficiency, regardless how slow the circuit runs ICCAD 2008 – November 12, 2008
4
What if There Was No Leakage? Normalized Energy/cycle in CMOS
Measured relay I-V 1.2
W x L = 2mm x 20mm
1.0
20
15
IDS (mA)
Normalized Energy/cycle
25
Etotal
10
0.6
VDS = 1V
0.0 34
0.1
10.1
0.2
0.3 Vdd (V)
0.4
compliance limit
0.4
0.2
Edynamic
5
H = g = 0.6mm
0.8
0.5
35
36
VGB (V)
37
No longer need to balance increasing component of energy as Vdd decreases Relays offer near infinite subsub-threshold slope and no leakage current ICCAD 2008 – November 12, 2008
5
Outline
If we could reliably build relays, would circuits implemented using relays offer superior energy--efficiency to CMOS? energy Compact Circuit Model for NEM Relays
Circuit Design Paradigms for NEM Relays
How to deal with ―slow‖ relays
Compare vs. CMOS
10.1
Fast yet key functionality captured
Digital Logic: 32-bit adder Mixed Signal I/O: DAC, ADC ICCAD 2008 – November 12, 2008
6
NEM Relay Structure & Operation Schematic Cross-Section Electrical Gate
Metal Channel
Gate Dielectric
W HH Source
t gap
Closed Relay (“ON”): |Vgb| > Vpi (pull-in voltage)
Air gapH
Drain Drain
tdimple
Body Dielectric Bodyody Body
Plan-View
L Source
Open Relay (“OFF”): |Vgb| < Vpo (pull-out voltage)
Anchor
Electrical Gate
Body Electrode (under body dielectric)
W
Drain
Channel
Contact Dimples 10.1
Relatively low on-state resistance (100Ω < Ron < 1kΩ @ 90nm)
Infinite off-state resistance Zero off-state leakage
ICCAD 2008 – November 12, 2008
7
NEM Relay Model Anchor
Mechanical Model Spring k
Mass m
Damper b C
0.5Rc Cs
S Rsd
0.5Rc Cd
Go Cgc
Rcs So
Cgb
D
Rcd
Rsd
Do Csd_b
Csd_b B
Compact Verilog Verilog--A model for circuit simulation:
10.1
Mechanical dynamics: spring (k), damper (b), mass (m) Electrical parasitics: non-linear gate-body (Cgb), gate-channel (Cgc), and source/drain-body cap (Csd_b), contact resistance (Rcs,d)
Vpi and delay verified against device measurements ICCAD 2008 – November 12, 2008
8
NEM Relay Scaling
Constant EE-field scaling has similar benefits as CMOS Pull-in Voltage with Beam Scaling
Measured pullpull-in voltages scale linearly If we can overcome reliability issues & surface forces scale: {W,L,tgap} = {90nm,2.3um,10nm} Vpi = 200mV
10.1
Mechanical delay also scales linearly (~10ns @90nm) ICCAD 2008 – November 12, 2008
9
NEM Relay as a Logic Element
4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar Non Non--inverting logic possible: actuation independent of drain/source voltages
Mechanical delay (~10ns) >> electrical t (~1ps)
Can stack 200 devices (with 1kW on-resistance) before electrical delay is comparable to mechanical delay 10.1
ICCAD 2008 – November 12, 2008
10
Digital Circuit Design with Relays
CMOS: delay set by electrical time constant
Relays: delay dominated by mechanical movement
10.1
Quadratic delay penalty for stacking devices (pass transistor logic) Buffer & distribute logical/electrical effort over many stages Want all relays to switch simultaneously Implement logic as a single complex gate ICCAD 2008 – November 12, 2008
11
Digital Circuit Design with Relays
4 gate delays
Delay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays For reasonable loads, relay delay unaffected by fan-out/fan-in
Area Comparison vs. CMOS
10.1
1 mechanical delay
Larger individual devices Fewer devices needed to implement the same logic function ICCAD 2008 – November 12, 2008
12
CMOS vs. Relays: Digital Logic Normalized Energy/op
100
80
60
40
20
0 1 10
2
10
3
4
10 10 1/throughput (ps/op)
5
10
Most processor components exhibit the energy vs. performance tradeoff of static CMOS Control, Datapath Datapath,, Clock
10.1
Adder energy performance tradeoff is representative ICCAD 2008 – November 12, 2008
13
Relay--based Adder Relay
Full adder cell:
10.1
12 relays vs. 24 transistors XOR for free Use fully complementary signals to avoid additional mechanical delay (to invert)
Relays all sized minimally
ICCAD 2008 – November 12, 2008
14
N-bit RelayRelay-based Adder
10.1
Ripple carry configuration Cascade full adder cells to create larger complex gate Stack of N relays— relays— still 1 mechanical delay
ICCAD 2008 – November 12, 2008
15
Snapshot: NEM Relays vs. CMOS Energy/op vs. Delay/op across Vdd
9x 10x
32-bit adder
CMOS
Relay
Supply Voltage
0.5 V
0.32 - 0.9 V
Load Cap per Output
25 fF
25 fF
Total Gate Cap
4.0 pF
125 fF
600 mm2
480 mm2
Area
Energy is for adder only
Compare vs. Sklansky CMOS adder1 For similar area: >9x lower energy/op, >10x greater delay
1D.
10.1
Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07).
ICCAD 2008 – November 12, 2008
16
Energy--Delay Results Energy
Relays always provide a more energy efficient solution
> 10x better, even in the GOPS range Energy/op vs. Delay/op across Vdd & CL
30x capacitance gain
2.4x Vdd gain
10.1
ICCAD 2008 – November 12, 2008
Lower device Cg, Cd Fewer devices No leakage energy
17
Area vs. Throughput Tradeoff 30 W Relay (buffered, 25fF)
A
Relay
/A
CMOS
25 20
W Relay (buffered, 100fF)
15
Au Relay (25fF) 10 Au Relay (100fF) 5
Erelay = 20fJ/op 0.5
2.5
3
For high throughputs, relays require area overhead to achieve similar throughputs as CMOS Area overhead is bounded (max. 6x6x-25x) :
10.1
1 1.5 2 Throughput (GOPS)
To maintain minimum energy/op for throughputs > 1GOPS, CMOS needs parallelism too ICCAD 2008 – November 12, 2008
18
CMOS vs. Relays: Mixed Signal
I/O energy dominant if core is ~30x more efficient See if I/O circuits can benefit as well Representative examples: DAC & ADC
How to process/generate analog signals? No ―linear gain‖ in relays – rely on switching
10.1
ICCAD 2008 – November 12, 2008
19
NEM Relay Based DAC DAC Architecture
Voltage Output
AC Impedance
EBIT
Need to add passive R to set impedance Don’t have to buffer up or size relays to drive output Relays’ gate voltage independent of I/O voltage
Energy dominated by I/O energy:
10.1
2
VIO2 CL
Same topology as CMOS except:
@VIO=200mV, CL=1pF: 62.8fJ/bit vs. ~780fJ/bit for CMOS ICCAD 2008 – November 12, 2008
20
NEM Relay Based Flash ADC
Topologically similar to CMOS
Key differences…
10.1
Sample/Hold input Flash Converter – bank of comparators
Sample & Hold: Unbalanced ―on‖ vs. ―off‖ switching times Comparators: Body terminal used as threshold, ambipolar actuation, hysteretic switching
ICCAD 2008 – November 12, 2008
21
Relay Based Comparators
Body terminal used to set threshold
VT
Hysteretic switching impact
10.1
Comparator will have different threshold if previous state varies Need to reset all comparators to the same state before each evaluation dynamic logic ICCAD 2008 – November 12, 2008
22
Relay Based Comparators 60 50
Output Code
Max 40 30
VT
Min
20 10 0
-0.2
0.8
1
Comparators above & below input will evaluate need a different decoder
Each comparator has a ―dead―dead-zone‖ where it stays off
10.1
0.2 0.4 0.6 Input Voltage (V)
Ambipolar actuation impact
0
Can use ―Max‖ or ―Min‖ of this range to determine code ICCAD 2008 – November 12, 2008
23
ADC Results Vcore = 0.3V C = 500fF R = 4kW VREF = 1V fsamp = 10 MS/s 6-bit res: 5.5fJ/conv
Energy dominated by resistor string (320fJ out of 350fJ)
10.1
Largely dependent on input dynamic range (VREF), not core voltage FOM improves with more resolution as resistor string still dominates
Most advantages stem from lower Ron at smaller Cg’s ICCAD 2008 – November 12, 2008
24
Conclusions
NEM relays offer unique characteristics
Superior Ion/Ioff ratio vs. CMOS Superior Ron*Cg vs. CMOS
Switching delay largely independent of electrical t
Relay circuits show order of magnitude energy efficiency improvements over CMOS
10.1
Use parallelism (and area) to achieve comparable throughputs in digital circuits Mixed signal circuits see similar benefits from low Ron at small Cgs
Key challenge is scaling devices reliably
ICCAD 2008 – November 12, 2008
25
Acknowledgements
10.1
Berkeley Wireless Research Center NSF DARPA FCRP (C2S2) Intel Foundation Graduate Fellowship
ICCAD 2008 – November 12, 2008
26