People usually hear about FITs in the context of HARD errors with typically acceptable values being 1-100 FIT. SOFT error rates, typically 10s of kFIT can cause ...
Technology Scaling Trends and Accelerated Testing for Soft Errors in Commercial Silicon Devices
© 2003 Robert Baumann
Mythology • Myth 1: Soft errors are only a problem for DRAMs • Myth 2: Soft errors don’t cause customer problems • Myth 3: As we scale technology, Soft Errors go away • Myth 4: Error correction will solve the problem
© 2003 Robert Baumann
Business Impact of SEU Sun Screen Daniel Lyons, Forbes Global, 11.13.00 mysterious glitch has been popping up since late last year...web companies, telecommunications companies, a Baby Bell in Atlanta, an Internet domain registry on the East Coast, high-end servers made by Sun Microsystems have, for no apparent reason, suddenly crashed...It has caused problems for America Online, Ebay and dozens of other major corporate accounts…The Sun has caused crashes at dozens of customer sites. An odd problem involving stray cosmic rays and memory chips in the flagship Enterprise server line…Sun found it had been shipping servers whose cache modules contained faulty SRAM (static random access memory) chips from a supplier it won't name.
Loss of customer confidence Loss of revenue © 2003 Robert Baumann
Motivation ! Soft errors induce the highest failure rate of all other reliability mechanisms combined. ! Soft errors impact customer perception of reliability. Undetected errors are viewed as the biggest threat since their impact on applications cannot be predicted. ! The problem will only get worse as circuit densities are increased and voltages are decreased. Most design tweaks for high speed/low power make soft error susceptibility much worse! © 2003 Robert Baumann
What is a FIT? Really…
1 failure 1 FIT = 9 10 dev − hrs. People usually hear about FITs in the context of HARD errors with typically acceptable values being 1-100 FIT. SOFT error rates, typically 10s of kFIT can cause some surprise! BUT remember…
SER of 100 kFIT is ~ 1 error/year © 2003 Robert Baumann
Terminology SET
Latched SET
SBU
Soft Error
MBU
SEE
SEFI SELU SEGR
© 2003 Robert Baumann
Hard Error
Causes of SEU • Electromagnetic Interference (EMI) • EMI can easily be avoided with standard procedures • Can be verified by adding shielding (Faraday cage) or eliminating source
• Circuit-level electrical noise • Care in layout reduces parasitic coupling capacitance • Errors from this issue will be repeatable for specific patterns/algorithms
• Board-Level electrical noise • Mitigated through the use of bypass capacitors and multilayer PCBs • Generally limited to specific components/paths
• Ionizing Radiation • Random in time and location • Dominant “noise” in well designed systems © 2003 Robert Baumann
Charge Generation in Silicon n-Si ++++++++++++++ ++++++++++++++
depletion
Ion track +++++++++- -+++++ +++++++ - + ++++ + - + + - + + - + funnel + electrons - + + - + + holes - + + - + + - +
p-Si
t=0© 2003 Robert Baumann
t=0+
The 3 Sources of Radiation in Commercial Applications
© 2003 Robert Baumann
Alpha Particles May and Woods, IRPS, Trans. ED 1979
60 10
40 20
0
0 0
Alpha Particle © 2003 Robert Baumann ~ 4 - 9 MeV
5 10 Particle Energy (MeV)
Range (um)
dQ/dx (fC/um)
(232Th, 238U, etc.)
80
20
From radioactive impurities
Alpha Emission Rates Processed Wafers 0.0002 α/cm2-hr Thick Th-232
Counts
4000 3000 2000
Cu Metal (thick)
0.0004
Al Metal (thick)
0.0014
Mold Comp.
0.024 - 0.0006
Underfill
0.002 - 0.0007
Pb-solders
7.200 - 0.0014
Standard
1000 0 2
4
6
8
Alpha Energy (MeV)
© 2003 Robert Baumann
10
10 - 0.01 α /cm2-hr
Low Alpha
< 0.01
Ultra Low Alpha
~ 0.001
Hyper Low Alpha < 0.0005
The Atmospheric “Filter” Protons (>> 1 GeV)
3rd-7th generation Neutrons, electrons, muons, protons (< 1 GeV)
© 2003 Robert Baumann
Flux (/cm2-sec-MeV)
Cosmic Particle Spectra 10-2 Neutrons (solid)
10-3 10-4
Electrons
10-5
Muons
10-6
Protons
10-7 10-8
1
Adapted from Ziegler IBM Journal of R&D,40(1),1996 © 2003 Robert Baumann
10 100 1000 10000 Particle Energy (MeV)
High Energy Neutrons and Silicon
+α 28Al + p 27Al + d 24Mg + n + α 27Al + n + p 26Mg + 3He 21Ne + 2a 25Mg
200
2.75 MeV 4.00 MeV 9.70 MeV 10.34 MeV 12.00 MeV 12.58 MeV 12.99 MeV
Reaction table from F. Wrobel et al., IEEE Trans. Nucl. Phys., Vol. 47, No. 6, Dec. 2000
© 2003 Robert Baumann
dQ/dx (fC/um)
n
28Si
From SRIM
150 100 50 0 0
2 4 6 8 10 Ion Energy (MeV)
Effect of Altitude Relative Neutron Flux (sea-level=1) .
400
Altitude
“Boeing model”
relative
% Urban
meters
flux
Pop
0
0
1.0
35%
1200
366
1.5
80%
1700
518
1.7
90%
3600
1097
2.4
95%
5000
1524
2.9
99%
10000
3048
11.4
>99.99%
feet
350 300 250 200
Terrestrial Altitudes
150
Flight Altitudes
Terrestrial Environment from 0 to 10,000 feet corresponds to a relative neutron flux range of 1.0 to 11.4x
100 50 0 0
20000
40000
60000
Altitude (Feet) © 2003 Robert Baumann
80000
Flight Environment from 28,000 to 60,000 feet corresponds to a relative neutron flux range of 72 to 356x
What’s the big deal about 10B?
© 2003 Robert Baumann
103 102
Boron-11
Oxygen
Silicon
Aluminum
Nitrogen
Element / Isotope
Phosphorus
10-3
Copper
10-2
Arsenic
10-1
Titanium
100
Tungsten
101
BORON 10
Neutron Cross-section (barns)
Thermal Neutron Cross Section for Various Elements 104
What 10B activation does in Si 90% of all 10B fissions are induced by neutrons below 15 eV
Recoil
Low Energy Neutron 0.84 MeV
10B
Nucleus α - particle 1.47 MeV © 2003 Robert Baumann
Alpha & Lithium dQ/dx (fC/um)
7Li
30 Lithium Recoils
20 Alpha Particles
10
0 0
1 Particle Energy (MeV)
2
Eliminating 10B SER Metal 6 Metal 5 Metal 4 Metal 3 Metal 2 Metal 1 n+
n+
p+
n well Silicon Substrate © 2003 Robert Baumann
p+
" Replace the first few µm’s of BPSG from the process. # Use 11B precursors for the BPSG process. $ Use boron rich shielding materials in the packaging. Baumann et al., IEEE IRPS, VLSI 1995 Baumann and Smith, 2000 IRPS, 2001 Microelectronics Reliability
Summary of Mechanisms Alpha Particles • Emitted from U,Th impurities in materials • Peak stopping pwr ~ 16 fC/um • limited range (< 40 um) • Dominant in processes that do not screen for alpha emission 10B
and Low Energy Neutrons
High Energy Neutrons
• σth 10B is huge & highly ionizing emissions
• Complex reactions
• High 10B concentration in BPSG (4 - 7%)
• Stopping power > 100 fC/um
• Peak stopping power ~ 16 & 25 fC/um
• Effect increases with altitude
• Effect localized
• Cannot easily be shielded
• Effect dominant in parts using BPSG © 2003 Robert Baumann
Design-in Reliability for SEU Phase I
Phase II
Phase III
(Development)
(Prototype)
(Qualification)
New NewTechnology Technology (Process/design) (Process/design)
Previous Previous Technology Technology
New Newmaterials materials characterization characterization
N-ASER N-ASER
SSER* SSER*
SERSIM/Spice SERSIM/Spice Modeling Modeling
a-ASER a-ASER
Model Model Verification Verification
Area AreaScaling Scaling Model Model
10B-ASER* 10B-ASER*
SER SEREstimator Estimator Model Model © 2003 Robert Baumann
SER SEREstimator Estimator Model Model
SEU Scaling Trends
© 2003 Robert Baumann
DRAM SER Scaling Trend 5.0 System SER
1
4.0 Vdd
0.1
3.0 bit SER
0.01
2.0
0.001 1
10
100
1.0 1000
DRAM Generation (Mbits) © 2003 Robert Baumann
Voltage (V)
DRAM SER (a.u.)
10
Based on embedded high-performance SRAM
100
5 0.25µm
0.35µm
10
w BPSG
0.13µm
0.5µm
1
0.18µm 0.7µm
0.1 0.1
1
4
10
3
0.09 µm
Voltage (V)
SRAM bit SER (A.U.)
SRAM Bit SER Scaling Trend
2
1 100
SRAM Integration Level (Mbits) © 2003 Robert Baumann
Soft Error Rate (A.U.)
SRAM SER vs. Technology
© 2003 Robert Baumann
Hazucha & Svensson IBM TI SRAM
After Shivakumar et al. (2002 IEEE Dependable Systems & Networks)
SRAM System SER Trend Based on embedded high-performance SRAM
System SER (A.U.)
w
100
G 0.09µm S BP
0.25µm
10 0.35µm
1
0.13µm
3
0.18µm
2
0.5µm 0.7µm
0.1 0.1
4
1 1
10
100
SRAM Integration Level (Mbits) © 2003 Robert Baumann
Voltage (V)
5
1000
Basic Non-SRAM SEU •
Type I : Direct Logic SEU (a few key latches/FFs) – SEU in output latch – Frequency independent – Similar mechanism to direct hit in SRAM cell
•
Type II : Propagation SEU (all FF/latches) – SEU in Latch/FF upstream from the output – Propagation of corrupt data requires time (defined by propagation delay of circuit path) so this static error can only be latched into an output if it arrives at the output latch before the clock edge. – SER decreases with increasing frequency since event has less time before the clock edge to be propagated through to the output latch/FF.
•
Type III : Latched SET (all logic) – SET in combinatorial logic – Propagation efficiency of SET pulse through logic increases with scaling – Increasing clock frequency offers more opportunities to latch an SET thus SER from this effect increases with increasing frequency
© 2003 Robert Baumann
Logic SER is Catching Up! 10 0.25µm
SER (A.U.)
1
0.18µm
0.13µm
SRAM bit SER
0.09µm
0.1 0.01
Logic bit SER
0.001 0.0001 SRAM bit SER w ECC
0.00001 1 © 2003 Robert Baumann
10 100 Integration Level (Mbits)
Soft Error Rate (A.U.)
Logic SER is Catching Up!
© 2003 Robert Baumann
After Shivakumar et al. (2002 IEEE Dependable Systems & Networks)
Vendor Comparison*(accelerated) 6T CMOS SRAM Comparison " Vendor C:
cosmic SER = 1399
FIT/Mbit
# Vendor A:
cosmic SER = 995
FIT/Mbit
$ Vendor A’:
cosmic SER = 1016
FIT/Mbit
1. Vendor C (@1.5V: 0.25µm: 6.0µm) 2. Vendor A (@1.4V: 0.16µm: 5.7µm) 3. Vendor A’ (@1.4V: 0.14µm: 4.6µm) Private conversations with contacts at other major companies revealed that their bulk CMOS SRAMs are 800-1500 FIT/Mbit for cosmic rays with alpha particles a fraction of this (10-50%). * discrete 6T CMOS SRAM nSER data from Dodd et al. [Sandia], IEEE IEDM, Dec. 2002, p. 334 © 2003 Robert Baumann
Field Test Verification Product: applications processor with 2Mbit embedded SRAM (0.25um): Ultra low alpha flip chip (~ 0.001 a/cm2-hr) Slow mode (10Mhz), 1.7V, 90 C: Algorithm: Memory test (every error detected): Sample Size: 600 identical chips: Test Time: 99 days or 600x99x24=> 1.42 Million device hours Altitude: 5300 feet (4.1x flux increase from sea-level): Errors observed = 4
Estimated Field SER was 2817 FIT or 1409 FIT/Mbit @ 5200ft Concrete shielding effect ~ 2.8x (8” of concrete @ attenuation of ~1.36x /10cm for n > 10 MeV) Net effect (altitude/shielding) ~ 4.1x/2.8x ~ 1.5x SER from alphas 113 FIT/Mbit (from SER Estimator) thus 1409 – 113 = 1296 FIT/Mbit for cosmic SER Extrapolating to sea-level and accounting for shielding 1296/1.5 = 864 FIT/Mbit @ sea-level So total SER at sea-level => 864 + 113 = 977 FIT/Mbit
Equivalent Sea-level Field SER ~ 977 FIT/Mbit Accelerated SER tests predict 982 FIT/Mbit (@ 1.7V) © 2003 Robert Baumann
Neutron Induced Latch-up!!! • In the past single-event latch-up (SELU) was a problem for electronics in Space – However recent data* suggests that terrestrial neutrons can induce latch-up in CMOS SRAMs. • Ionization produced by neutron interaction with silicon induced parasitic bipolar action which in turn latches the device. • If not a hard error the latch up condition at least requires a full chip power down and thus this failure mode has a big impact on reliability. With an equivalent exposure > 68 Million years of cosmic neutron irradiation, we have NEVER observed a latch-up in TI SRAMs (C10 – C035) Vendor C* (0.25 µm, 1.8V)
70 FIT/Mbit (@25C)
600 FIT/Mbit (@125C)
Vendor A* (0.16 µm, 1.5V)
0.3 FIT/Mbit (@25C)
10 FIT/Mbit (@125C)
TI (0.25 µm, 1.8V)
0 FIT/Mbit (@30C)
TI (0.18 µm, 1.5V)
0 FIT/Mbit (@30C)
TI (0.13 µm, 1.2V)
0 FIT/Mbit (@30C)
* From Paul Dodd et al. IEEE 2003 IRPS, p. 51-55 © 2003 Robert Baumann
SEU Sensitivity Evaluation Radiation Radiation Characterization Characterization
At speed testing Real application SRAM vs. Logic SET => SEU
RAM RAMSensitivity Sensitivity Characterization Characterization Logic LogicSensitivity Sensitivity Characterization Characterization SET SET Characterization Characterization 3D 3DDevice Device Simulation Simulation
Product Product Characterization Characterization Accurate Accurate Product ProductSER SER
Accurate Accurate Product ProductMTTF MTTF Circuit/Chip Circuit/Chip Simulation Simulation
Non-SRAM SER Algo. dependence SET => SEU
MTTF-1 < SER © 2003 Robert Baumann
Customer Customer Application Application Models Models
Summary – Looking Ahead • SEU are a growing concern as technology is scaled down for SOME applications. • SRAM is currently very sensitive to radiation (1000-2000 FIT/Mbit in commercial 6T processes using ultra low alpha materials). • ECC/Parity is a viable way of protecting memory from SEU. • Logic failure rates will erode the efficacy of ECC in all designs. • Hardened logic libraries will be needed or schemes to mask logic sensitivity (redundancy on critical paths). • Neutron induced latch-up is a major concern for some vendors. © 2003 Robert Baumann