Radiation-induced soft-errors are an increasingly important reliability issue in deep-submicron CMOS integrated-circuit (IC) technologies. A bit error is called a ...
Soft-Error Rate Testing of Deep-Submicron Integrated Circuits Tino Heijmen and André Nieuwland Philips Research Laboratories (WAY41), High Tech Campus 5, 5656 AE Eindhoven, The Netherlands, email: {tino.heijmen,andre.nieuwland}@philips.com
Abstract 2. Background Soft errors induced by radiation pose a major challenge for the reliability of complex chips processed in state-of-the-art technologies. This paper reviews soft-error rate (SER) characterization by realtime system-SER testing and by accelerated testing. Additionally, we present scaling trends, simulation approaches, and improvement techniques. Special attention is given to soft errors in combinational logic.
1. Introduction Radiation-induced soft-errors are an increasingly important reliability issue in deep-submicron CMOS integrated-circuit (IC) technologies. A bit error is called a soft error if the data is corrupted but the device itself is not damaged. In contrast, a permanent device failure is called a hard error. Both cosmic rays and radioactive impurities in process and package materials are capable of producing ionizing particles. These particles induce charges in silicon that can disturb logic states in circuits. In some applications, such as high-end servers with large memory caches, soft errors could cause a system crash more than once a month, if no protection is applied. Soft errors are also very relevant for automotive and medical applications, which require high product reliability. See [1] for a recent survey of the soft-error subject. The outline of this paper is as follows. Section 2 briefly treats the physical background of soft errors. Then, we discuss in Sec. 3 the methods that are applied to measure the soft-error rate (SER) under real-life and under accelerated conditions. Typical failure-rates for circuits in current technologies are presented in Sec. 4. Methods to model soft-errors and techniques to reduce the SER at different levels are treated in Secs. 5 and 6, respectively. Finally, Section 7 discusses SER in combinational logic, which requires a dedicated approach, both in analyzing and in SER reduction.
Proceedings of the Eleventh IEEE European Test Symposium (ETS’06) 0-7695-2566-0/06 $20.00 © 2006
IEEE
Two dominant radiation sources cause soft-errors in ICs operating at sea level [2]: - Alpha particles emitted by radioactive impurities present in the IC package and in the IC itself, - Cosmic neutrons originating from the interaction of high-energy cosmic rays with atoms in the earth’s atmosphere. An alpha particle is capable of ionizing silicon by generating electron-hole pairs, as is depicted in Figure 1. Neutrons do not directly produce charges in silicon, but can interact with silicon and other atoms. The products of this interaction then ionize the material. When an ionizing particle, such as an alpha particle, intersects a reverse-biased pn-junction (e.g., the drain junction of a transistor in the OFF-state), this junction can collect the charges that are generated along the particle track. Both diffusion and drift (in the disturbed electric fields) play a role in the collection of charge carriers. An example is shown in Figure 1. The result is a current pulse with a typical width of 1-100 ps. In current technologies SRAM is the most sensitive component, because of the small feature sizes and high memory densities. Therefore, we focus on SRAM in the following, unless indicated otherwise.
Figure 1. Charge generation by an alpha particle in an NMOS transistor that is in the OFF-state. The intersected pn-junction will collect the generated electrons by drift and diffusion.
The charges generated by ionizing particles can have several effects on circuits. The most common radiation effect is that the data state of a single memory cell is changed. This is what is usually named a soft error or a single-event upset (SEU). In some cases the induced charges cause more than one bit-flip. Such an event is named a multiple-bit upset (MBU). A different effect is that radiation may cause a single-event latchup (SEL). The collected charges then trigger a parasitic bipolar transistor, which is inherently present in CMOS technologies. This results in a high current drawn from the supply, which can only be corrected by a power-down. Especially in space applications and in high-voltage circuits other radiation effects may occur, but these are outside the scope of the current paper.
3.2. Accelerated SER test Alternatively, SER can be measured under accelerated conditions, i.e., in the presence of a strong extra radiation source. For a complete accelerated SER (ASER) characterization, the alpha- and the neutroninduced contributions have to be measured separately. Alpha-ASER testing can be performed relatively easily when automated test equipment (ATE) is available. The device-under-test (DUT) is irradiated with alpha particles from a radioactive isotope. Generally, this isotope is deposited as a thin layer on a metal substrate, see Figure 2, which is placed at a minimum distance (preferably less than 1 mm) above the IC with the radiating surface facing the DUT.
3. SER testing 3.1. System-SER test The most direct way to determine the SER of an IC is to measure it under nominal conditions. However, the SER of a well-designed chip is low. Typically, such real-time testing, which is generally named system SER (SSER) test, requires in the order of one million device hours. For example, if a chip contains 3 Mbit of SRAM with a SER of 3,000 FIT/Mbit (1 FIT equals one failure per 109 device hours), then 1,000 samples of this chip have to be tested for 1,000 hours to observe three soft errors on average. The neutron flux and consequently the neutroninduced component of the SER increase 10u with every 3 km of increase in altitude (this trend saturates at about 15 km). Because of this, SSER experiments are often performed in mountain laboratories at high altitude. Examples in Europe are the research facility at Jungfraujoch (3580 m) [3], Switzerland, and the recently opened Altitude SEE Test European Platform (ASTEP) [4] in Pic-de-Bure (2552 m), France. The neutron flux as a function of altitude is accurately known and the neutron-induced SER is linearly proportional to this neutron flux. At high altitudes the neutron-component dominates over the alpha-induced SER. Therefore, the SER measured at high altitude can be easily extrapolated to obtain the neutron-induced SER level at sea level. However, if one wishes to accurately measure the alpha-induced contribution to the SER, experiments have to be done at underground locations that are shielded from cosmic neutrons, e.g, in the Fréjus tunnel between France and Italy [5]. In the 1980s IBM researchers demonstrated with measurements at different locations that both alpha particles and cosmic rays contribute to SER [6].
Proceedings of the Eleventh IEEE European Test Symposium (ETS’06) 0-7695-2566-0/06 $20.00 © 2006
IEEE
Figure 2. Typical DUT configuration during alphaaccelerated SER testing. The most widely used alpha sources are isotopes of thorium (Th-228 and Th-232), americium (Am-241), and curium (Cm-244). These alpha sources are different in their energy spectrum. Thorium sources have a broad spectrum between 2 and 9 MeV. Am-241 and Cm-244 have spectra with mono-energetic peaks at 5.4 and 5.7 MeV, respectively. It has been shown that the use of a different alpha-source results in a different voltage-dependency of the measured chip SER [7]. Preferably, an alpha-source with a broad spectrum is applied, because this closely resembles the real-life situation where most alpha particles originate from thorium and uranium isotopes. Unfortunately, it is difficult to have access to thorium (or uranium) sources with a sufficiently high flux. Because of this, mono-energetic sources, such as Am-241 and Cm-244, are most commonly used. For neutron-accelerated SER testing preferably a neutron-beam is used that has a continuous (white) energy spectrum that resembles as closely as possible the spectrum observed under nominal conditions. One of the preferred facilities is the Weapons Neutron Research (WNR) beam of the Los Alamos Neutron Science Center (LANSCE) in New Mexico, USA [8]. The energy distribution of this beam closely matches
the neutron spectrum at sea level, but its intensity is eight orders of magnitude higher. This means that one hour in the beam is equivalent to about 15,000 years under nominal conditions at sea level. Since several years another neutron beam with similar characteristics has been available for SER characterization at the TRIUMF facility in Vancouver, Canada [9]. SER data measured at TRIUMF agree within 25% with results from the WNR beam, using the same devices. Differences in the SER observed at the two locations can be explained by the fact that the WNR neutron spectrum extends to higher energies. In contrast with WNR the TRIUMF beam contains thermal neutrons, i.e., neutrons with energies below 1 MeV. Testing with and without shielding layers of cadmium, which remove these thermal neutrons from the beam, and comparing the results gives the sensitivity of the DUT to the thermal neutrons. This is not possible with the WNR beam. Soft errors induced by thermal neutrons are relevant if boro-phosphosilicate glass (BPSG) is used in the chip processing [2]. As an alternative to neutron-SER testing with white spectrum beams mono-energetic neutron sources can be applied. In these beams all neutrons have the same, well-defined energy. Possibly, the energy spectrum has a low-energy tail, in which case the neutron beam is called quasi-mono-energetic. By combining the experimental data measured at different energies, the SER under nominal conditions can be determined. The Theodor Svetberg Laboratory (TSL) is one of the facilities in Europe that offers a quasi-mono-energetic neutron beam for accelerated SER testing. Soft-error rates obtained from measurements at TSL have been shown to agree within the experimental accuracy (about 40%) with SER data from the WNR whitespectrum neutron beam [10].
3.3. Accelerated test strategy and setup Because accelerated neutron-SER testing takes place at a neutron facility, a portable test setup is needed. In general, one or more DUTs are mounted on a DUT-board that is steered by a test and control board. This board is controlled remotely with a PC outside of the test room. A schematic overview of the test setup is shown in Figure 3. Generally, the test plan for alpha- or neutronaccelerated SER testing contains multiple runs in which the following parameters are varied: - Supply voltage (VDD) - Data pattern (All1, All0, or checkerboard) - Operational frequency (static or dynamic) - Temperature
Proceedings of the Eleventh IEEE European Test Symposium (ETS’06) 0-7695-2566-0/06 $20.00 © 2006
IEEE
Generally, test vehicles are used for measurements and the results are extrapolated to similar circuits on different chips. In this way, an acceptable balance can be found between test cost/time and the accuracy of the SER estimation. Especially neutron-ASER testing is costly because beam time at the neutron facilities is expensive and dedicated test setups have to be developed. Because non-accelerated SSER test requires hundreds or thousands of samples and months of testing, SSER is generally used only to calibrate the results from ASER testing, which requires only a few samples and hours or days of testing. For calibration purposes it is sufficient to apply one test vehicle per technology node at most.
Figure 3. Typical test setup (hardware) for neutronaccelerated SER testing. The standard requirements and procedures for terrestrial SER testing of ICs and reporting of the results have been specified in the JEDEC standard JESD89 [11]. For example, the standard specifies that SER data obtained from alpha-accelerated SER tests should be extrapolated to an alpha flux of 0.001 particles/hr-cm2 and the neutron-ASER data to the typical neutron flux observed at New York City (NYC). Recently, three addenda have been published that describe in more detail SSER, alpha-ASER, and neutron-ASER testing [11].
4. Typical SER levels Typical SER results are shown in Figure 4, using a 0.13 Pm embedded SRAM as an example [12]. The SER decreases exponentially with increasing VDD, because both the node charges and the drive currents of the transistors increase, which makes it more difficult to induce an upset. Also, Figure 4 shows that significant SER variations are observed between similar SRAMs on different chips, between different batches of the same chip, and even between different
samples from the same batch. These differences are attributed to variations in the process parameters [12].
The impact of technology scaling on SER is a subtle balance between two effects. On the one hand, there is a reduction in the critical charge (Qcrit), i.e., the minimum amount of charge that has to be injected into a circuit node to cause a soft error. On the other hand, the efficiency with which charges are collected decreases [1]. The net result for FFs is a small but significant increase in the average SER per bit. For SRAMs, which have smaller feature sizes than FFs, the SER per bit has saturated.
Figure 4. Alpha-SER of 0.13 Pm SRAM. Figure 5 shows typical alpha- and neutron-induced contributions to the SER per bit for both SRAMs and flip-flops (FFs) in a 90-nm technology [13]. The FF SER is strongly dependent on the data (‘0’ or ‘1’) and clock (‘high’ or ‘low’) states and on cell design. Figure 5 gives the highest, average, and lowest FF SER that is measured for the four states and five evaluated designs. Both alpha particles and neutrons contribute significantly to the SER of SRAMs and FFs. The average SER per bit for FFs is even higher than for SRAMs in 90-nm. However, note that the SER per mm2 is much higher for SRAMs than for FFs, because the cell area is about one order of magnitude smaller.
Figure 6. Scaling trends for alpha-induced SER of SRAMs and FFs (same legend as in Figure 5). The SER per bit stays constant or increases, while the number of bits in a system that are sensitive to soft errors tends to grow. Therefore, the trend is that at the system level the SER is increasing. In general, the SER of an IC in state-of-the-art technologies is dominated by its embedded SRAM instances, because of the large number of sensitive bits. However, the contribution to the SER from logic, especially from latches and FFs, is growing. Furthermore, protection is easier for SRAMs than for logic, as is discussed in Sec. 6. DRAM is nowadays relatively robust against soft errors [1].
5. SER simulation
Figure 5. Alpha- and neutron-SER of SRAMs and flip-flops in 90-nm process. Figure 6 shows the scaling trends for the alphainduced SER for three recent technology nodes [13]. At the 0.13-Pm node the worst FF SER is 4u as large as the average value, because the SER of one FF in this node is practically zero in three of the four FF states.
Proceedings of the Eleventh IEEE European Test Symposium (ETS’06) 0-7695-2566-0/06 $20.00 © 2006
IEEE
Simulation methods are necessary to investigate soft errors in an early design phase, when experimental data are not yet available. Soft errors can be simulated at the device, circuit, and chip levels [1]. Device simulation is challenging, as it requires advanced computational methods and extensive computing time. However, it supplies an essential contribution to the understanding of the physical mechanisms leading to soft errors. Also, it is used to make accurate estimations of parameters that are applied in SER prediction methods.
Circuit simulation is particularly suitable for the calculation of Qcrit. A current pulse with a fixed waveform is used to model the charge injection process. Qcrit is determined by varying the magnitude of this pulse [14]. The computed Qcrit can be applied in analytical models to predict SER. Such models are often calibrated with experimental data [1], [14]. Chip-level simulations do not only take into account the SER of the memory elements, such as SRAMs and flip-flops, but also address other factors that make up the SER that is observed at the chip level: - Timing derating is the time fraction that a circuit is sensitive to soft errors that can propagate. - Logic derating specifies the probability that a soft error affects the chip functioning; it depends on the architecture and the application of the device.
6. SER reduction techniques In order to improve the SER several measures can be taken at the process and design level. At the process level the use of triple-well structures reduces the SER with a factor of 1.52.5. Also substrate and well engineering can be used to improve SER. Buried and epitaxy layers are both capable of decreasing SER and have as an additional advantage that the SEL vulnerability is reduced. Silicon-oninsulator (SOI) processes are intrinsically less sensitive to soft errors than bulk processes. Partially depleted SOI circuits typically show a 5u lower SER than their counterparts processed in bulk CMOS, because of a smaller sensitive volume from which induced charges can be collected. Fully depleted SOI is even less vulnerable, especially if body ties are applied. Simulations predict that the SER improvement of double gate transistors (FinFETs) will be even better. Especially SRAM cells can be radiation-hardened efficiently by adding metal-insulator-metal (MIM) capacitors that are connected to the storage nodes. Because of the increase in the node capacitance, Qcrit is much larger, resulting in a SER that is reduced with orders of magnitude [1]. At the circuit and system level the SER can be improved by protecting memory instances with errorcorrection coding (ECC). Single-error correct/doubleerror detect codes are most widely applied. Physical interleaving (“scrambling”) reduces the probability that multiple bits in a logical word are upset by the same event. This is especially relevant if an ECC is used that can correct single bit errors only. Scrubbing is a technique to prevent accumulation of errors by periodically checking the memory content and correcting bit errors, using the ECC. Error-correcting
Proceedings of the Eleventh IEEE European Test Symposium (ETS’06) 0-7695-2566-0/06 $20.00 © 2006
IEEE
methods for logic are not practical because of a 2-4u increase in area, due to the design complexity. Instead, many design changes have been reported to improve the SER of logic circuits. These modifications usually apply time- or spatial redundancy techniques [1], [15].
7. SER in combinational logic Combinational logic currently has a small impact on the chip SER, especially at frequencies below 1 GHz, but its contribution to the chip SER will increase in future technology generations. SER in combinational logic requires dedicated approaches in testing, simulation, and design modification [15]. If a circuit node collects the charges that are generated by an ionizing particle, the result is a disturbance of the node voltage. This disturbance (“glitch”) may manifest itself in combinational logic as a so-called single-event transient (SET) that propagates trough the downstream paths. If the glitch is latched by a memory element, such as an SRAM or a flip-flop, it results in a corrupt data bit. The error is called a soft error because it is indistinguishable from a bit error caused by a direct hit of the memory cell. Because of the short duration of the transients and the fact that a SET has to be latched in a memory element to be observable, measuring SER in combinational logic is very difficult. Therefore, simulation is usually preferred over measurement to characterize SER in combinational logic. The sensitivity of a combinational logic cell depends on its circuit topology. Also, the sizes of the transistors in the cell affect the SET probability. A small transistor has a small gate capacitance and a weak current drive. The result is a relatively low Qcrit and consequently a high vulnerability. The combination of input signals determines which circuit nodes in the logic gate are sensitive to upsets. Therefore, the probability of a SET at the gate’s output is dependent on the input combination, as is illustrated in Figure 7 for a 2-input NAND gate. Although the ‘01’ and ‘10’ combinations are logically equivalent, they result in a different SET probability. For the ‘01’ combination both NMOS transistors are sensitive to particle hits, while for the ‘10’ combination only the upper one (with B as input) is sensitive. The SER analysis by simulation for a combinational logic block requires the computation of the following: - The gate SER is the probability of a SET at the output of a given logic gate. - The glitch observability is the probability that a logically enabled path exists for the induced SET to propagate to a primary output of the circuit.
-
The electrical masking factor (EMF) is the probability that the SET will not be attenuated on its path to a primary output or to an FF, so that it can produce a soft error; it depends on both the path length and on the gate types in the path.
Inputs A B 0 0 0 1 1 0 1 1
Prob. (a.u.) 10 120 90 50
Figure 7. Probability of a SET at the output OUT for a 2-input NAND gate. To compute the gate SER the probability of each input combination has to be known. Therefore, to perform the gate SER analysis one traverses from the primary inputs to the outputs. The calculations of the glitch observability and of the EMF require backward traversing algorithms, because the observabilities of the gates in the downstream paths have to be known. For large circuits not all input combinations but a limited number of random input vectors are applied. If the calculated SER for the logic block is too high, it can be improved by gate multiplication. In this approach the most sensitive logic gates are duplicated, with the corresponding inputs and outputs connected. The larger drive strength significantly reduces the probability of an SET at the joint output. Typically, gate multiplication reduces the SET probability of a gate with a factor of 10. The two gates are physically separated to prevent that both are upset by the same particle strike. The approach can be selectively applied to SER critical parts of the design. Simulations on the example circuit depicted in Figure 8 show that a 50% SER reduction can be obtained at a 30% area penalty [15]. For larger circuits, the area overhead is smaller as the SER dominant part is only a fraction of the circuit.
Conclusions The soft-error rates (SER) of ICs are characterized by alpha- and neutron-accelerated testing. The results can be validated with (unaccelerated) system-SER tests. Simulation methods are available at the device, circuit, and chip levels to study soft errors in an early design phase. SRAM dominates the SER at the chip level, but can be efficiently protected with ECC. Logic SER needs special attention, because its impact grows with technology scaling and it is more complicated to simulate and to mitigate. Because the number of bits tends to grow, the SER at system level is increasing.
References [1] J. Maiz and N. Seifert (Guest Eds.), “Special Issue on Soft [2] [3] [4] [5] [6]
[7] [8] [9]
[10] [11]
[12]
[13]
[14]
Figure 8. Example circuit with 50% SER reduction by duplicating only three logic gates.
Proceedings of the Eleventh IEEE European Test Symposium (ETS’06) 0-7695-2566-0/06 $20.00 © 2006
IEEE
[15]
Errors and Data Integrity in Terrestrial Computer Systems”, IEEE Trans. Dev. Mater. Reliability, Vol. 5, No. 3, Sep. 2005. R.C. Baumann, “Soft errors in advanced semiconductor devicesPart I: the three radiation sources”, IEEE Trans. Dev. Mat. Reliability, Vol. 1, No. 1, Mar. 2001, pp. 17-22. http://www.jungfraujoch.ch http://www.l2mp.fr/astep http://www-lsm.in2p3.fr T.J. O'Gorman, J.M. Ross, A.H. Taber, J.F. Ziegler, H.P. Muhlfeld, C.J. Montrose, H.W. Curtis, and J.L. Walsh, “Field testing for cosmic ray soft errors in semiconductor memories”, IBM J. Research and Development, vol. 40, no. 1, pp. 41-49, Jan. 1996. J.F. Ziegler and H. Puchner, “SERHistory, trends and challenges: a guide for designing with memory ICs”, Cypress Semiconductor, 2004. http://lansce.lanl.gov E.W. Blackmore, P.E. Dodd, M.R. Shaneyfelt, “Improved capabilities for proton and neutron irradiations at TRIUMF”, Proc. IEEE Radiation Effects Data Workshop, 2003, pp. 149155. T. Granlund, B. Granbom, and N. Olsson, “A Comparative Study Between Two Neutron Facilities Regarding SEU”, IEEE Trans. Nucl. Sci., Vol. 51, No. 5, Oct. 2004. “Measurement and reporting of alpha particles and terrestrial cosmic ray-induced soft errors in semiconductor devices”, JEDEC standard JESD89, Aug. 2001; “Test method for alpha source accelerated soft error rate (SER)”, JEDEC standard JESD89-2, Nov. 2004; “Test method for beam accelerated soft error rate”, JEDEC standard JESD89-3, Sep. 2005. T. Heijmen and B. Kruseman, “Alpha-particle induced SER of embedded SRAMs affected by variations in process parameters and by the use of process options”, Solid-State Electronics¸ vol. 49, no. 11, pp. 1783-1790, Nov. 2005. T. Heijmen, P. Roche, G. Gasiot, and K.R. Forbes, “A Comparative Study on the Soft-Error Rate of Flip-Flops from 90-nm Production Libraries”, Proc. Int’l Reliability Physics Symp. (IRPS) 2006, accepted for publication. T. Heijmen, “Analytical semi-empirical model for SER sensitivity estimation of deep-submicron CMOS circuits”, in Proc. Int. On-Line Testing Symp. (IOLTS), pp. 3-8, 2005. A.K. Nieuwland, S. Jasarevic, G. Jerin, “Combinational Logic Soft Error Analysis and Protection”, submitted to Int’l On-Line Test Symp 2006.