3018
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO. 6, DECEMBER 2011
A Dual Mode Redundant Approach for Microprocessor Soft Error Hardness Lawrence T. Clark, Senior Member, IEEE, Dan W. Patterson, Nathan D. Hindman, Student Member, IEEE, Keith E. Holbert, Senior Member, IEEE, Satendra Maurya, Student Member, IEEE, and Steven M. Guertin, Member, IEEE
Abstract—A dual mode redundant (DMR) logic data path with instruction restart that detects errors at register file (RF) write-back is presented. The DMR RF allows SEU correction using parity to detect RF entry nibbles that are correct in one copy but not the other. Detection and backing out incorrect write data are also described. The radiation hardened by design (RHBD) circuits are implemented in 90 nm CMOS. The DMR microarchitecture is described, including pipelining, error handling, and the associated hardware. Heavy ion and proton testing validate the approach. Experimentally measured cross sections and examples of errors due to pipeline SET or RF SEU are shown. Critical node spacing and the mitigation of multiple node collection are also described. Index Terms—Dual mode redundancy, error correction, radiation hardening, register files, sequential logic circuits, single event effects, soft errors, total ionizing dose.
I. INTRODUCTION
R
ADIATION hardening is increasingly important for both commercial terrestrial systems and those aimed at harsh environments, e.g., space electronics. A. Single Event Effects Soft errors due to radiation induced charge collection have long been a major concern for memories. It is of increasing concern in scaled VLSI circuits, due to reduced current drive and capacitance at both logic and storage nodes, and relatively constant charge deposition and collection. These factors combine to create longer single event transients (SETs) and greater likelihood of single event upset (SEU) at storage node latches [1]–[5]. Soft errors have become increasingly problematic for terrestrial processing, and remain a severe issue for space based electronics. Error detection and correction (EDAC) has been used to protect large memories, e.g., microprocessor caches for many Manuscript received July 22, 2011; revised September 10, 2011; accepted September 11, 2011. Date of publication November 18, 2011; date of current version December 14, 2011. This work was supported by the U.S. Air Force Research Labs in Albuquerque, NM. S. Guertin’s research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration and supported through the Director’s Research and Development Fund. L. T. Clark, D. W. Patterson, N. D. Hindman, K. E. Holbert, and S. Maurya are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail:
[email protected]; dan.
[email protected];
[email protected];
[email protected]). S. M. Guertin is with NASA Jet Propulsion Laboratories, Pasadena, CA 91109 USA (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNS.2011.2168828
years, but is ineffective for protecting data path logic pipelines. Additionally, SETs can affect control logic, essentially changing the circuit operation from the specified behavior, which may cause inadvertent data or operations in a microprocessor. The two primary methods for hardening logic are redundancy in time (temporal approaches) and redundancy in space (dualmodular and triple-modular redundancy, DMR and TMR, respectively). Temporal approaches are straightforward, but have the drawback of placing large delays in the timing critical path, essentially increasing the setup time for the circuit, whether accomplished by special process (usually resistor and capacitor) elements [6] or by radiation hardening by design (RHBD) [7], [8]. The delay must exceed the worst-case single event transient (SET) duration, which for bulk CMOS processes has been measured to exceed 1 ns [2]–[4]. Finally, besides the performance impact, many schemes are still susceptible to clock or control logic errors [9], [10]. TMR has been implemented at multiple granularities, with checking ranging from the sequential element [11] to the board level [12], [13]. Both DMR and TMR can achieve clock rates similar to commercial microcircuits, but at the cost of die area and increased power. DMR has a 33% power and area advantage over TMR, but unlike TMR where majority voting can be used for correction, the correct data or control signals cannot be determined readily. B. Register Files Register files (RFs) are key components in high performance circuits, and especially, microprocessors. RFs are differentiated from standard SRAM in that the readout is usually single ended, and they have as few as two to dozens of read ports. Moreover, their size is much smaller, commonly ranging from 32 to 256 entries. In a microprocessor the RF resides in the key critical timing paths of the ALU/bypass loop, where operands must be read from the RF, modified by the ALU, and subsequently used in one clock cycle. The Alpha 21264 microprocessor used two RFs, each supporting decoupled superscalar data paths [14]. Breaking the RF into two units allowed four (fully differential) readout ports for each data path. Write back operations in the two RFs were independent, with an extra clock for bypassing results from one register file to another, and no soft error checking mechanism was supported. The addition of parity to the 90-nm Itanium RF illustrates the increasing importance of mitigating soft errors in terrestrial ICs, particularly servers [15]. This design adds a serial parity calculation to each register, with four clocks of latency after a
0018-9499/$26.00 © 2011 IEEE
CLARK et al.: A DUAL MODE REDUNDANT APPROACH FOR MICROPROCESSOR SOFT ERROR HARDNESS
Fig. 1. DMR RF configuration showing read and write port connections, including feedback to the Rt/Rd write port for error correction using data in the copy with good parity to overwrite data in the other copy with a parity error. Correction is by nibble, so bits in both RF entries can be upset so long as both copies of any nibble are not. Mismatches in the A and B WB data are caught on the input to the RF, stalling the processor to restart with clean architectural state.
write, during which time the register is unprotected. Checking is static, but no repair mechanism is provided. RF errors are thus detectable but not correctable.
3019
stored data, or SETs in the RF readout path or DMR data path logic. Using DMR error detection is easy, but DMR is difficult to implement, since the logic cannot discern which of the redundant copies is incorrect. SEUs in the RF are repaired on a nibble basis, by copying correct contents from one DMR copy to the other, based on which has correct parity. Background RF scrubbing is also supported. Fortunately, in a microprocessor, pipeline state not committed to architectural state is speculative and can be discarded. The approach used in the DMR circuits presented in this paper is to restart the operation from a known good architectural state. The key is error detection before the machine speculative state is committed to become architectural state. The test chip used in this work to validate the approach employs a DMR data path that includes an ALU/bypass path suitable for a high-speed microprocessor and a DMR RF for the same embedded CPU. The ALU/bypass logic and RF are controlled by a test engine implemented completely in TMR on the test die. Studies of basic DMR approaches have been published, cf. [13], but do not include experimental or comprehensive simulation results. Additionally, the scheme only allows a clock cycle time similar to that for circuits protected by temporal techniques. For the first time, this paper describes a complete approach to DMR processor ALU/bypass and RF data path error checking and recovery. DMR checking is performed only at the commit to architectural state, i.e., at the RF write back stage, which minimizes the checking circuitry required and reduces boundary cases, by only restarting instructions from one point. The microarchitectural and circuit changes allow both SET and SEU protection at high speed, which is demonstrated by both proton and heavy ion broad beam testing.
C. Register File Soft Error Mitigation A key problem in RF soft error protection is the lack of time in the pipeline, i.e., critical timing paths, for EDAC calculation in the stages leading to and from the RF. EDAC also requires a 62.5% increase in storage capacity for byte level correction [16], which is closer to DMR than no redundancy. Wider protection requires deeper parity circuits, adding two to four inversions to the critical timing path. Consequently, RFs more commonly use parity protection, which is fast to generate, but can only allow error detection. Mohr presented a sufficiently fast XY parity based EDAC scheme to allow its use in RFs [17]. However, EDAC does not protect against erroneous data or operations caused by SETs, either in the RF or in the ALU/bypass circuitry that produces and consumes data residing in the RF. Analysis of synthesized RFs on a 130 nm CMOS process showed that TMR had similar area impact as adding EDAC, while the latter approximately doubled the access time [18]. FPGA synthesized RF analysis shows greater TMR area and performance impact, but better hardness [19]. D. Application and Contribution of This Work The designs and experimental data presented here cover logic intended for an embedded five pipeline stage microprocessor implemented primarily in DMR logic. The RF is also implemented using DMR, which allows data path results checking at the write back (WB) stage as shown in Fig. 1. The data path results can be erroneous due to SEUs in the DMR register file
E. Paper Organization Section I summarizes the prior hardening approaches, register file requirements, and contribution of this work. Section II describes the RF circuit design and how hardness to inadvertent SET writes and MBUs is obtained. Section II describes the ALU bypass logic, circuits and layout to avoid MBUs, including the checking WB interface. Experimentally measured results, which demonstrate both SEU and SET induced error mitigation, are described in Section III. Section IV concludes and summarizes the results. II. REGISTER FILE CIRCUITS AND MICROARCHITECTURE The RF design here combines multiple microarchitectural and circuit level approaches, starting with DMR for compatibility with the data path. DMR decoders generate redundant read and write word line signals. Full SEE protection is provided by DMR combined with large critical node separation through bit interleaving and parity. Parity provides error detection; DMR allows one copy to provide clean data for SEU corrections; interleaving prevents multi-cell upsets (MCUs) from causing uncorrectable errors; inadvertent operations are prevented by circuit approaches. The parity protection is nibble based. This allows the greatest possible critical node spacing, albeit at the expense of twice the parity storage, but with the advantage of one less XOR gate in the parity generation and checking paths.
3020
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO. 6, DECEMBER 2011
A standard load-store CPU would have two read ports and one write port, supporting the ALU operands and results, respectively. Referring to Fig. 1, this design adds a third, i.e., the Rt/Rd read port used to read a copy of the destination register contents about to be overwritten in the next cycle WB stage. Thus, the overwritten RF data can be restored if the overwriting RF store is cancelled due to a detected data path error (SEU or SET) which can be detected as the WB is in progress. Since DMR does not indicate which copy is in error, the original data are returned to the RF and then the instruction with non-matching results is restarted from the beginning of the pipeline. Finally, since the RF state is unknown at power up, the processor control logic tracks whether a cell has been written, i.e., if the parity is valid. This TMR logic validates parity checking on an entry by entry basis.
the corruption could have been from any control or data path logic SET or latch SEU. Restarting these operations protects this logic. Soft errors manifest as read word line (RWL) errors, if they affect the data read, are corrected similarly.
A. Error Detection and Operation As mentioned, in the target microprocessor, all data path actions that update the RF check the DMR data at the RF. The WB occurs, and in the event that the A and B copy data do not match, a detected SEE results in an exception in the actual processor. Note that there is insufficient time to avoid a write, but the machine is stalled to keep the incorrect state from propagating further into the architectural state—the erroneously written value is backed out. To allow this, a specific exception type, i.e., the SEE exception, is added for detected soft errors. An added instruction, which when executed (within the SEE exception handler) replaces the values that may have been corrupted, is the back up register file (BURF) instruction. The BURF instruction replaces the RF entry that was last written with the value previously read out of the Rt/Rd port. Many different types of soft errors may result in a soft error exception, and since the replacement operation should only occur if the last instruction wrote the RF, a processor flag is set on each WB and reset on each non-WB operation. This flag gates the BURF execution. In the test chip used for experimental validation, the RF backup is handled by an equivalent TMR test engine instruction, but the action is the same. In Fig. 1, the checking circuits are evident in the block labeled ‘A vs. B data comparison’, which checks that the WB data match and thus is not in error. The same WB checking mechanism is used to detect write word line (WWL) mismatch errors that occur when one of the two WWL copies is asserted, but not the other. Also, the saved value read from the Rt/Rd port is implemented with fine grained self-correcting logic, like the test engine. For example, when the detector circuit determines only one WWL is asserted on a write operation, the write also fails. The pipeline is then stalled to avoid propagating incorrect state. Then the prior value in the register entry intended to be written is restored—this restores the correct prior RF architectural state. The operation where the error was detected is subsequently restarted with correct data. Key recovery state is stored in self-correcting TMR circuits [11]. Similarly, if one of the WWLs is truncated, which may partially write both copies, the entire operation is started over from the previously correct architectural state. Additionally, in the case where the A and B data do not match,
B. Register File SEU Recovery The RF circuits support new processor instructions where parity groups in RF copy A with correct parity overwrite the version in copy B, and vice versa, in two clock cycles. In the target processor, an added processor instruction activates this hardware, i.e., repair general purpose register (RGPR), under software control. To keep the hardware simple, no hardware interlocks are implemented—NOP instructions are required to provide a second cycle for the repair. In this manner, potentially SEU corrupted RF state can be repaired in 64 clock cycles (the time to read each register and then write it back) when an error is detected. The control logic is very simple—only when the A parity is incorrect is the A copy overwritten, otherwise the B copy is. Only accumulated errors affecting the same parity group in both copies result in an uncorrectable error, but due to large physical separation between cells in the same parity nibble these are likely due to multiple ionizing radiation particle strikes. These are in turn, mitigated by background scrub operations. Since in a microprocessor, specific registers are used for specific tasks, e.g., are reserved for the stack pointer, etc., some are infrequently used, inviting defeat of parity by accumulating upsets from multiple impinging radiation particles. Accumulated errors affecting both A and B RF copies are avoided by allowing frequent scrubbing. Besides allowing potentially incorrect data to be backed out when a mismatch between the A and B data copies is detected, the Rt/Rd read port also allows opportunistic scrubbing. When an instruction in the pipeline stage prior to the WB stage will not write to the register file in the next pipeline stage (e.g., a store instruction), the Rt/Rd read port is used to read a register, which is automatically checked for errors by the parity checking circuitry. Thus, all registers are sequentially read in a rotating fashion in parallel with normal operations, and are checked for errors (by the usual parity check mechanisms). This hardware scrubbing mechanism minimizes the probability of accumulated strikes causing irrevocable errors. The scrubs opportunistically use the Rt/Rd port, when it is not required by an instruction writing back to the RF. C. Circuit Design and Timing The RF cell design differs from that in a conventional dynamic RF in that a write operation requires simultaneous assertion of two WWLs—this mitigates the possibility of inadvertent writes to RF locations due to control logic SETs (see Fig. 2(a)). One write word line, WWLA, is controlled by the A pipeline copy, while WWLB is controlled by B pipeline circuits. Consequently, a SEE manifest as a control error that propagates to the RF cannot affect the architectural state in the RF. Read out is conventional, discharging the dynamic read bit line (RBL) if the cell state is a logic ‘1’. Two of the read ports are connected to one side of the cell, and one to the other, to increase the minimum storage node capacitance, which would dominate the cell SEU hardness.
CLARK et al.: A DUAL MODE REDUNDANT APPROACH FOR MICROPROCESSOR SOFT ERROR HARDNESS
Fig. 2. RF cell circuit showing dual WWL transistors to block SET induced writes (a). Cell layout showing annular NMOS transistors and guard rings for TID immunity; adjacent non-rectangular cells share an N-well (b). The inside nodes of the annular transistors are sensitive to SEE, externally at V . Very little of the storage cell area collects charge to cause SEU (c). The conventional two-edge PMOS pull up transistors and N-well are evident.
Fig. 3. Simulated waveforms, using fully extracted layout netlists, showing the RF operation and speed. The first clock cycle shows a correct write. The second clock cycle is a read. The next clock cycle shows WL error detection. The redundant WWL checkers detect a short or missing WL assertion, an incorrect WL assertion to one copy, even due to a missing clock. The error propagates and is detected in time to stall the pipeline at the next clock edge for recovery.
Set-dominant latches conclude the readout path, presenting a static time-borrowing output to the surrounding circuitry. While fast, the domino RBL nodes are susceptible to SETs, as evidenced in radiation tests (see Section III). Simulated RF operational waveforms, using a netlist with full extracted parasitics are shown in Fig. 3. The parasitic extraction flow uses a proprietary Calibre flow that correctly models the transistor neck Miller and gate capacitance. Effective channel length and width are calculated as derived from [20]. Circuit speed is comparable to a commercial design, except for greater WL loading for the DMR design due to longer routes through the array because of the interleaved DMR. Correct write and read operations show the timing in the first two clock cycles of Fig. 3. In the third clock
3021
Fig. 4. Redundant write WL checking circuit is essentially two domino (one’s catching) NOR gates. The full keepers allow pseudo-static behavior and the delay generators (ovals) delay the checking to allow both versions to settle. The output is qualified with RF write operations. Redundant versions are needed since failure to assert AClk or BClk due to a control SET will also fail to assert the controlled write WL.
cycle, an erroneous single WWL assertion on WWLB, without the corresponding WWLA assertion is shown. The subsequent error detection results in signal WLError rising, as shown. Incorrect WB data and redundant WWL assertions are detected by a one’s catching domino circuit (see Fig. 4) similar to that used in our prior RHBD cache [21]. Clock chop circuits delay the turn on edge to allow the checked state to settle after the rising clock edge controlling such state transitions (cf., Fig. 1). The same delays are used to generate a pulsed pre-charge, which eliminates the need for a dynamic to static converter latch. The D2 domino ERRA and ERRB signals are discharged for any transient mismatch between the checked inputs. The full keeper holds the state even if the mismatch transient is short. D. Register File Layout The fully TID protected RF cell layout is shown in Fig. 2(b). Unlike the RF cells in [17] which used a pre-discharge PMOS based cell to minimize RHBD size at the cost of speed, this cell uses conventional NMOS transistors. The greater size of the NMOS transistor dominated layout increases storage node separation. To effectively use the cell width, the PMOS transistors share a short N-well with the adjacent cell as shown. The read pull down transistors are tapered for maximum speed and minimum RBL capacitance. All key NMOS nodes are driven by the diffusion enclosed by the annular devices, minimizing charge collection area and minimizing capacitive loading and thus power dissipation, e.g., the RF cell storage nodes in Fig. 2(c). The entire RF layout showing the bit interleaving is shown in Fig. 5. The decoders are spatially separated, as shown, to prevent a SEE from corrupting both copies simultaneously, since it is
3022
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO. 6, DECEMBER 2011
Fig. 5. Thirty-two entry 40-bit RHBD DMR RF layout with parity group interleaving (color coded) and one parity group’s A and B copy bits are outlined in black over the layout to show critical node separation of 15 cells (67.2 m). The similarly spaced DMR (A and B copy) WL decoders are also evident.
imperative that the WWLs not be mis-asserted the same way. In the same manner, two incorrect RWL assertions that were identical would send the same incorrect data through both copies of the ALU, and would not be detected as an error at WB. E. Test Chip Design Fig. 6 shows the essential test chip pipeline with emphasis on the RF ALU/bypass loop that occupies the E-stage of the pipeline. The primary differences between the test chip circuits and an actual processor are that the latter includes a shifter and logic unit. The ALU and RF are controlled by a programmable built-in test engine implemented entirely in TMR logic with fine-grained correction [11]. The test chip is implemented on a trusted foundry 90 nm bulk CMOS fabrication process. The test chip photomicrograph with the test circuits overlaid comprises Fig. 7. The TMR test engine has four pipeline stages and is controlled by a 64 entry 40-bit instruction memory, a 32 entry 40-bit address memory, and a 32 entry 36-bit data memory. All of these memories are implemented as register files with fine-grained local error correction. The test engine limits the circuit speed to about 250 MHz. Clocks are generated by a foundry supplied unhardened PLL. III. SEE TESTING RESULTS A. SEE Testing Broad beam ion tests comprised two basic types, static RF and dynamic RF/ALU/bypass tests. The former exercise the RF memory statically, i.e., without the data path in operation, to determine the RF SEU and SET characteristics. The device under test (DUT) is on a daughter card, attached to and controlled by an FPGA main board. The latter records errors at test speed. The FPGA communicates with a PC via a USB connector, e.g., sending error logs under user command. In the static tests the RF is written with all 0’s, all 1’s or a checkerboard pattern. The RF is then read continuously by the
Fig. 6. The test chip pipeline matches the processor pipeline around the RF, ALU/bypass loop. The circuits are controlled and checked by a test engine using full TMR with local correction at each pipeline stage.
Fig. 7. Test die photomicrograph with test chip layout overlaid; the RF and ALU/bypass circuits are outlined and labeled. The die is 3.75 mm/side.
test engine. In the event of an error, the upset stored data are corrected and the test continues at the next location. An SET, presumably in the RF dynamic read bit line circuits, address, or decoding, is detected if the initially read data is incorrect, but is correct on a subsequent read. Both copies are read in these tests, which reflect the actual DMR processor operation, where each redundant data path/control copy receives unchecked and uncorrected data words from the RF. However, only one read port is used. The dynamic tests utilize the RF WB checking circuits to characterize the data path checking functionality. In these tests,
CLARK et al.: A DUAL MODE REDUNDANT APPROACH FOR MICROPROCESSOR SOFT ERROR HARDNESS
3023
Fig. 9. Measured RF bit cell heavy ion cross section as determined by static testing, constantly reading the RF locations and writing back erroneous values : V. at V
= 12
Fig. 8. Representative measured error upset types as logged in the tests: One bit RF SEUs in the all 0’s test (a) and SETs in the all 1’s test, which we assume to be dynamic BL discharges (b). Multi-bit upset of one of the DMR copies, while running the ALU/RF bypass loop due to ALU SET (c). A multi-bit error to one copy, demonstrating an error that can be fixed using the SEU repair mechanism-the B copy of entry 23 has no errors (d). A ‘.’ indicates 1 to 0, and a ‘*’ denotes 0 to 1 errors, respectively. (a) SEU RF errors, (b) SET RF errors, (c) Multi-bit upset, (d) Multiple cell errors in one RF copy that can be repaired.
a DMR Kogge-Stone adder, complete with input and output bypass paths as would appear in a standard microprocessor performs arithmetic operations on the RF data. As a side effect, the types of errors generated by the data path can be examined. Thus, these tests allow SETs to dramatically corrupt one copy, due to transients on the adder prefix circuitry corrupting the MSBs due to propagating carry errors, or incorrect functions or bypassing due to control SETs in one DMR pipeline copy. When an error is detected in these tests, which have parity checking turned off to propagate more errors into the data path, the check type is logged. These are either ‘WB error’ indicating the A and B WB data mismatched, or WL error, which indicates that the WWL checker found a WWL assertion discrepancy. At this point, the test engine is re-loaded to dump the RF contents. This allows accumulated RF SEUs to be examined. Then, the RF is repaired and the dynamic test resumed. Examples of the test logs are shown in Fig. 8. The output data are decompressed and decoded at the PC, allowing the bit locations, and corresponding entry to be mapped (Fig. 8(a)). All bits, including the parity bits, are treated as data in the static tests. A MCU can be determined as well, since each test pass has an 8-bit time stamp. Fig. 8(b) shows the map of SET failures recorded in one test. An example SET upset, affecting many bits in the dynamic testing, is shown in Fig. 8(c). Here, since the mismatches are confined to the word most significant bits, we speculate that the adder prefix circuitry propagated an SET, although a latch
SEU may have been the culprit. Regardless, the potential impact of an error propagating through the combinational logic is evident. An MCU, detected in the full speed testing comprises Fig. 8(d). Since the stored bits are not adjacent, this is presumed to originate in the A data path DMR copy. This is correctable by the RF parity based repair mechanism, although this write is redacted after the A and B mismatch is detected. B. Heavy Ion Testing Heavy ion testing was performed at the Lawrence Berkeley Labs 88 SEE testing facility. Boron, oxygen, neon, argon, and copper ions with nominal (normal incidence) linear energy transfer (LET) of 0.89, 2.19, 3.49, 9.74, and 21.17 MeV-cm /mg at angles ranging from 0 (normal) to 70 were used. Testing was = 1, primarily performed at 100 MHz and 200 MHz at V 1.2, and 1.4 V. At angles, the beam was aimed in the direction for which MBUs would be most likely (across the words). The PLL was shielded during ion testing. Fluence was 2(10 ) to 10 and flux was 10 to 10 ions/cm /s in most of these tests. The experimentally measured RF bit cross section vs. effective LET (LET ) is shown in Fig. 9. Since near minimum PMOS transistors are used in the storage latches, the threshold LET is low. It is however, larger than for RHBD SRAM cells on the same process [21], which is attributable to the high load capacitance of the transistors driving the read ports. MCU extent was one bit for over 90% of the recorded SEUs. The longest extent was two bits. The chip level cross sections, for argon ions with nominal (normal incidence) LET = 12.89 MeV-cm /mg, is shown in Fig. 10. The bar labeled “WB error” indicates that an A/B copy mismatch detection triggered an RF dump. The SEU bars indicate only RF errors that were not the original cause of the detected error, i.e., accumulated upsets. WB error MBUs are shown, and as expected are relatively common, since this is how SETs in the data path are detected. The ‘MBU’ column shows upsets of multiple cells, where 2-bit upsets predominated, as in the static tests.
3024
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO. 6, DECEMBER 2011
Fig. 10. Measured RF and ALU/bypass circuit heavy ion cross section : V and : from dynamic testing exercising both at V MeV-cm /mg (40 angle). WB error and WL error refer to how the initial error was detected; SEU refers to other errors inside the RF when the contents were dumped; scrubbing was not run, but writes naturally update some RF contents.
= 12
LET = 12 89
C. Proton Testing Proton beam energies of 49.3 and 13.5 MeV were used. While tests with the PLL were performed, most of the testing was performed at 60 MHz using the PLL bypass mode, since PLL upsets can cause test chip malfunctions due to bad clocks and the PLL is difficult to shield from proton strikes. Tests were run with the PLL on at speeds up to 250 MHz, but a number of (presumably clock upset) induced gross failures were recorded with the PLL on. proFor most RF tests the total fluence was 5 to 8.9 protons/cm /s. For tons/cm . Flux was 2.3 the 49.3 MeV proton energy, the measured cell (per bit) cross sections for static SEU testing of the RF were 3.71(10 ) cm , 2.68(10 ) cm and 1.12 cm at and V, respectively. Similar response was measured for all 0’s and all 1’s patterns. This was expected, since the RF may be hit on either side and cell capacitance is relatively balanced. SETs were observed only for the read all 1’s case but only in one test and thus with a very low cross section. For the 13.5 MeV proton beam energy, the measured cell cross sections for static SEU testing of the RF were 1.07(10 ) cm , 8.98(10 ) cm and 1.72(10 ) cm at = 1, 1.2 and 1.4 V, respectively. MCUs were rare and V the longest upset extent was two bits. Cross sections from the supply dynamic tests are shown in Fig. 11 at different V voltages and proton energies. A slight decrease with supply voltage, and reduced cross section at lower energy are evident. D. Analysis Only one RF port is used in the static tests, and thus the RBL can only upset when the data would have retained the RBL high. Consequently, in ion testing over 98% of the SETs were detected with logic ‘1’ stored, leading us to conclude that the dynamic RBLs dominate the RF SET cross section (see Fig. 8(b)).
Fig. 11. Measured RF and ALU/bypass circuit proton cross section from dynamic testing exercising both at multiple proton energies and VDD. WB error and WL error refer to how the initial error was detected; Accumulated SEU refers to other (SEU) errors inside the RF when the contents were dumped.
In proton testing SETs were only detected with logic ‘1’ in the cell. The time to dump the RF state after a dynamic error was detected was 13.7 s. This may have also allowed some additional error accumulation (reported as SEU in Figs. 10 and 11). The data confirm that the proposed DMR hardening techniques will be effective. SEU MCU extent was far below the critical node spacing. While many bit errors (representing a potentially large equivalent MCU extent) were detected in the dynamic tests, the many bit errors occurred only in one, e.g., the A but not B, copy. Regardless, these operations are restarted. Uncorrectable errors were noted in some proton tests at high flux. Similar errors were produced in heavy ion testing when the PLL was not shielded, but not with the PLL shield properly positioned, and were thus attributed to clock upsets, which cause setup time failures. We relied on the non-redundant bypass mechanism within the foundry IP PLL, so could not determine if the errors were due to clock SETs or error accumulation. IV. SUMMARY AND CONCLUSIONS This paper has described a DMR approach to microprocessor data path and RF hardening that is capable of commercial microprocessor speeds. Error detection is simple as applied only at RF write back and during scrub operations prior to WB, and is capable of mitigating any data path SETs, including those on control logic. While the design presented here has both DMR data path and RF logic, duplicating only the latter is applicable to commercial designs. Lack of DMR data path and control signals removes the SET protection, but provides the SEU protection when joined to the parity based repair mechanism, with minimal timing impact. ACKNOWLEDGMENT The authors would like to thank David Pettit and Rahul Shringarpure for their invaluable contributions to the RF implementation.
CLARK et al.: A DUAL MODE REDUNDANT APPROACH FOR MICROPROCESSOR SOFT ERROR HARDNESS
REFERENCES [1] P. Dodd, M. R. Shaneyfelt, J. A. Felix, and J. R. Schwank, “Production and propagation of single-event transients in high-speed digital logic ICs,” IEEE Trans. Nucl. Sci., vol. 51, no. 6, pp. 3278–3284, Dec. 2004. [2] M. Gadlage, R. D. Schrimpf, J. M. Benedetto, P. H. Eaton, D. G. Mavis, M. Sibley, K. Avery, and T. L. Turflinger, “Single event transient pulse widths in digital microcircuits,” IEEE Trans. Nucl. Sci., vol. 51, no. 6, pp. 3285–3290, Dec. 2004. [3] J. M. Benedetto, P. H. Eaton, D. G. Mavis, M. Gadlage, and T. Turflinger, “Digital single event transient trends with technology node scaling,” IEEE Trans. Nucl. Sci., vol. 53, no. 6, pp. 3462–3465, Dec. 2006. [4] P. Dodd and F. Sexton, “Critical charge concepts for CMOS SRAMs,” IEEE Trans. Nucl. Sci., vol. 42, no. 6, pp. 1764–1771, Dec. 1995. [5] T. Hoang, J. Ross, S. Doyle, D. Rea, E. Chan, W. Neiderer, and A. Bumgarner, “A radiation hardened 16-Mb SRAM for space applications,” in Proc. IEEE Aerospace Conf., 2007, pp. 1–6. [6] G. Anelli, M. Campbell, M. Delmastro, F. Faccio, S. Floria, A. Giraldo, E. Heijne, P. Jarron, K. Kloukinas, A. Marchioro, P. Moreira, and W. Snoeys, “Radiation tolerant VLSI circuits in standard deep submicron CMOS technologies for the LHC experiments: practical design aspects,” IEEE Trans. Nucl. Sci., vol. 46, no. 6, pp. 1690–1696, Dec. 1999. [7] R. Lacoe, J. V. Osborn, R. Koga, S. Brown, and D. C. Mayer, “Application of hardness-by-design methodology to radiation-tolerant ASIC technologies,” IEEE Trans. Nucl. Sci., vol. 47, no. 6, pp. 2334–2341, Dec. 2000. [8] K. Warren, A. L. Sternberg, J. D. Black, R. A. Weller, R. A. Reed, M. H. Mendenhall, R. D. Schrimpf, and L. W. Massengill, “Heavy ion testing and single event upset rate prediction considerations for a DICE flip-flop,” IEEE Trans. Nucl. Sci., vol. 56, no. 6, pp. 3130–3137, Dec. 2009. [9] D. Hansen et al., “Clock, flip-flop, and combinatorial logic contributions to the SEU cross section in 90 nm ASIC technology,” IEEE Trans. Nucl. Sci., vol. 56, no. 6, pp. 3542–3550, Dec. 2009. [10] B. Narasimham, O. A. Amusan, B. L. Bhuva, R. D. Schrimpf, and W. T. Holman, “Extended SET pulses in sequential circuits leading to increased SE vulnerability,” IEEE Trans. Nucl. Sci., vol. 55, no. 6, pp. 3077–3081, Dec. 2008.
3025
[11] N. D. Hindman, D. E. Pettit, D. W. Patterson, K. E. Nielsen, X. Yao, K. E. Holbert, and L. T. Clark, “High speed redundant self-correcting circuits for radiation hardened by design logic,” Proc. RADECS Conf., pp. 465–472, 2009. [12] B. Peters, A. Wardrop, D. Lahti, H. Herzog, T. O’Connor, and R. DeCoursey, “Flight SEU performance of the single board computer (SBC) utilizing hardware voted commercial PowerPC processors on-board the CALIPSO satellite,” in Proc. IEEE Radiation Effects Workshop, 2007, pp. 16–25. [13] J. Teifel, “Self-voting dual-modular-redundancy circuits for single event transient mitigation,” IEEE Trans. Nucl. Sci., vol. 55, no. 6, pp. 3435–3439, Dec. 2008. [14] B. Gieseke et al., “A 600 MHz superscalar RISC microprocessor with out-of-order execution,” in Proc. IEEE Int. Solid-State Circuits Conf. Tech. Dig., 1997, pp. 176–177. [15] E. Fetzer, D. Dahle, C. Little, and K. Safford, “The parity protected, multithreaded register files on the 90-nm itanium microprocessor,” IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 246–255, Jan. 2006. [16] M. Hsiao, “A class of optimal minimum odd-weight-column SEC-DEC codes,” IBM J. Res. Develop., vol. 14, no. 4, pp. 395–401, Jul. 1970. [17] K. Mohr, G. Samson, and L. T. Clark, “A radiation hardened by design register file with low latency and area cost error detection and correction,” IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 1335–1342, Aug. 2007. [18] R. Naseer, R. Bhatt, and J. Draper, “Analysis of soft error mitigation techniques for register files in IBM Cu-08 90nm technology,” in Proc. IEEE Int. Midwest Symp. Circuits and Systems, 2006, pp. 515–519. [19] R. Hentschke, F. Marques, F. Lima, L. Carro, A. Susin, and R. Reis, “Analyzing area and performance penalty of protecting different digital modules with Hamming code and triple modular redundancy,” in Proc. Symp. Int. Circuits and Systems Design, 2002, pp. 95–100. [20] A. Giraldo, A. Paccagnella, and A. Minzoni, “Aspect ratio calculation in n-channel MOSFETs with a gate-enclosed layout,” Solid State Electron., vol. 44, pp. 981–989, Jun. 2000. [21] X. Yao, D. Patterson, K. Holbert, and L. Clark, “A 90 nm bulk CMOS radiation hardened by design cache memory,” IEEE Trans. Nucl. Sci., vol. 57, no. 4, pp. 2089–2097, Aug. 2010.