INTERNATIONAL TEST CONFERENCE. 1. 978-1-4244-7207-9/10/$26.00 .... proposed in 1959 for double adjacent errors in teleph
Error-Locality-Aware Linear Coding to Correct Multi-bit Upsets in SRAMs Saeed Shamshiri and Kwang-Ting Cheng Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106 {saeedshamshiri, timcheng}@ece.ucsb.edu
Abstract
1.1
High-energy cosmic radiation is the major source of soft errors in SRAMs that can cause multi-bit upset around the location of the strike. In this paper, we generalize the coding problem for error detection and correction of both local (burst) and global (random) errors. We suggest using error-locality-aware codes for SRAM memories to correct single-bit or multi-bit upsets as well as physical defects. Solving the coding problem with a SAT-solver, we have found codes to correct double global or multiple (>=3) local errors for 8, 12, 16, and 24-bit memories. For 16-bit memories, we propose a code that corrects two global or four local errors. With the same cost, our proposed code provides extra reliability than double-error-correcting BCH code. For 12-bit memories, we suggest a code that corrects two global or five local errors and has the same cost as triple-error-correcting Golay code but provides better reliability against multi-bit upsets. For memories of other widths, using syndrome analysis, we demonstrate the possibility of designing codes to correct any arbitrary number of local and global errors.
Single error correcting codes cannot correct a codeword with more than one erroneous bit. An ECC with greater correction capability seems inevitable to tackle the MBU problem of SRAMs [15-18]. A parallel implementation for Double-Error-Correcting (DEC) BCH (Bose-ChaudhuriHocquenghem) code was proposed and suggested for SRAMs with words of 16, 32, and 64 bits wide [15-16]. A Triple-Error-Correcting Quadruple-Error-Detecting (TECQED) Golay code is another suggested ECC to protect SRAMs from multiple errors [17]. The Matrix code, a combination of the parity code and the Hamming codes, was proposed for triple error correction in SRAMs [18]. Another hybrid method proposed in [19] combines the Reed-Solomon [20] with the Hamming codes for hardware implementations. This hybrid ECC offers a single symbol correction capability with multi-bit upset within one symbol. All of these methods provide better reliability than SEC ECC, at the cost of a lower code rate.
1.
Introduction
Cosmic rays and high-energy particles are the dominant cause of soft errors in SRAMs and in digital circuits [1]. In the mid-1990’s, radiation induced soft errors were first reported in DRAMs [2-3]. However, the changes in DRAM technology later made them more robust while SRAMs have become more vulnerable to soft errors as the technology scales [4]. Soft errors can manifest as SingleBit Upsets (SBU) of SRAM cells. Error Correction Codes (ECC) have been used for many years to mitigate SBU. The most common types of ECC for memories are SingleError-Correcting (SEC) Hamming codes [5], Single-ErrorCorrecting Double-Error-Detecting (SEC-DED) extended Hamming codes, and Hsiao codes [6]. Sometimes a single strike of a high-energy particle affects several adjacent memory cells, and causes Multi-Bit Upset (MBU). MBU was first observed in airplanes and space equipments which are more exposed to the galactic cosmic rays [7]. Later, with technology scaling and the significant reduction in noise margins, MBU was observed in commercial electronics used for terrestrial applications [814]. Recent studies show that MBU is the dominant contributor to the overall soft error rate and can be as large as 13 bits in a 90nm SRAM [15].
Higher-order ECCs
The common issue with all of these higher-level ECC methods is that they don’t consider the locality of the erroneous bits and blindly provide protection for all possible multiple errors of the target codeword, resulting in rather inefficient and expensive codes. In other words, these methods assume that there is an equal chance for every bit of a codeword to be faulty, and there is no correlation between the failures of adjacent bits. This is a fair assumption if the major cause of the failure is unknown. However, for the special case of MBU, it is known that the bit flips would occur in the bits within a close proximity to the location of the strike [17]. We can derive better codes if we consider the locality of the faulty bits. This observation leads us toward investigating burst error correcting codes.
1.2
Burst codes
Codes for correcting non-independent errors were proposed in 1959 for double adjacent errors in telephone lines and magnetic tapes [21]. Later, a method of constructing codes for correcting clustered errors of any prescribed duration was introduced by Reiger [22]. In 1960, Gilbert proposed a class of burst error correcting p q codes generated by f(X)=(X +1)(X +1)/(X+1), where p and q are relatively prime numbers [23]. Gilbert codes encode (p-1)(q-1) bits of data into pq bits of code, and correct a single burst of length b or less. The value of b is dependent on p and q, and cannot be any arbitrary number [24]. From
Paper 7.1 INTERNATIONAL TEST CONFERENCE 978-1-4244-7207-9/10/$26.00 ©2010 IEEE
1
1960 to 1988, many papers have discussed different methods to calculate b, given p and q [23-29]. Burst codes have never been used for SRAM memories. In this paper, for the first time, we investigate applying burst codes to correct MBU in SRAMs. However, there are two issues with Gilbert codes and other burst codes that make them less interesting for SRAM applications. First, since the length of the correctable burst error (i.e. b) depends on the size of codeword (i.e. the product of p and q), it is not feasible to use Gilbert’s generator polynomial to produce codes with any desirable burst correction capability. Second, Gilbert codes are more suitable for long codewords (i.e. grater than 1000) and long burst errors (i.e. greater than 30) which are the characteristics for communication channels and magnetic storage devices [24], and are not very useful for relatively short memory words (e.g. 32 or 64). For example, there exists no Gilbert code for a 32-bit memory that corrects bursts of length two. While Reiger’s codes [22] do not encounter these two problems mentioned above, Reiger’s codes, designed for correcting only one burst error, suffer from lack of ability for correcting random errors. That is, they can’t recover data if two distant bits of data, which are farther from each other than the length of the code’s burst correction capability, are erroneous. It has been shown that Gilbert codes and similar burst codes, such as product codes [30] and array codes [31-32], provide some reliability against random errors [33]. However, they can’t be designed to correct certain number of random errors. For example, for a 32-bit memory, we can’t design a Gilbert code with double or triple error correction capability. In other words, while the available burst error correction methods (including product codes [30], array codes [31-32], Gilbert codes [23], and Reiger’s codes [22]) may be able to correct random errors under some special cases, in general they don’t have the correction capability for random errors.
1.3
Error-locality-aware codes
Figure 1 shows the reliability problems encountered in a typical SRAM. On one side, high-energy radiation can cause MBU in a single memory word (as depicted in Figure 1.a). On the other side, there may exist several isolated errors in the memory word caused by radiation, cross-talk noise, or age-induced defects (as depicted in Figure 1.b). A SEC code cannot solve either of these two problems. Higher order ECCs, like BCH and Golay codes, can correct the random errors, but not burst errors. A burst code, e.g. a Reiger’s code, can recover data from a burst error, but not from random errors. In this paper, we propose a new class of codes that can recover data in the presence of either burst or random errors. We provide a generalized definition for the coding problem that addresses both burst and random errors. In this paper, we refer to random and burst errors as global and local errors respectively. We also refer to a code designed to cover both local and global errors as an errorlocality-aware code. For example, an error-locality-aware code that corrects two global errors or three local errors is
Paper 7.1
a code that corrects two random errors (i.e. errors that can occur anywhere in the codeword), or a single burst error of length no more than three (i.e. three or fewer erroneous bits which are within three adjacent bits). We name such a code a 2G3L (two-global, three-local) code.
Figure 1. Local and global errors in an 8-bit SRAM caused by radiation or physical defects.
Using the proposed definition, we can directly design codes for any desirable number of corrections of local and global errors. There is a significant difference between our proposed approach and available methods: In Gilbert codes [23], array codes [31-32], and product codes [30], the number of local error corrections depends on the size of the codeword; therefore, it is not feasible to design codes for any arbitrary number of local errors. For example, it is not possible to use any of the available methods to design a code that corrects four local and two global errors for 16-bit memories. Other than higher-order ECCs, a commonly-accepted engineering solution to the problem of MBU is interleaving the codewords in memory cells [8, 17]. Using a sufficient interleaving distance, each of the adjacent faulty memory cells belongs to a different codeword, and thus, all erroneous bits can be corrected using a SEC approach. Since the width of the MBU increases as technology shrinks, the interleaving distance should increase accordingly [17]. A minimum interleaving distance of four to eight was recommended for 150nm SRAMs and beyond [8, 12-13]. Although interleaving can address the MBU problem, it requires buffering of memory, imposes delay in the memory read cycle, increases the power consumption, and incurs additional implementation complexity [34-35]. Another technique to avoid accumulation of soft errors in memories is memory scrubbing, which periodically reads from memory and immediately writes the data back [36]. Faster scrubbing improves reliability against soft errors but increases the power consumption. Moreover, there is a 4 practical upper bound for the scrubbing rate which is 10 scrub cycles per day, or approximately one every 10 seconds [17]. Error correction methods can be used in conjunction with bit interleaving and memory scrubbing to further improve the overall reliability of the memory against burst and random errors.
INTERNATIONAL TEST CONFERENCE
2
1.4
1
Contributions
The contributions of this paper are as follows: 1.
We merge the idea of random error correction with burst error correction and provide a general definition for the coding problem that covers both types of errors.
2.
Using a boundary analysis on the number of error syndromes, we show the possibility of designing new codes that provide a higher reliability against MBU and random errors for SRAM memories with the same cost as the existing state-of-the-art solutions. We show that the coding problem can be translated into a Boolean satisfiability problem and a code that suits our purpose can be found using a Boolean SAT solver.
3.
1.5
We propose some exemplary error-locality-aware codes for 8, 12, 16, and 24-bit SRAM memories to recover data from double global errors and multiple (>=3) local errors. With the same cost as Golay and BCH codes, our proposed codes provide better correctability against MBU.
Organization
In Section 2, we define the coding problem for random error corrections and analyze the number of error syndromes in this class. We also discuss the fundamental dilemma of the coding theory. In Section 3, we present similar analysis for burst codes and compare them with random error correcting codes. Merging burst codes with random error correcting codes gives us a code of better quality against both random and burst errors. We propose this generalization in Section 4 and illustrate the design space for this class of codes. We also explain how this class of codes overcomes the fundamental dilemma of the coding theory. In Section 5, we suggest some exemplary codes for 8, 12, 16, and 24-bit SRAMs. We also briefly discuss the implementation of these codes in hardware. Section 6 concludes the paper.
2. 2.1
Random Error Correcting Codes Problem Defenition
All random error correcting codes are based on the following classic definition of coding: Definition 1: Code(n,k,d) encodes a k-bit data word into an n-bit codeword such that every pair of codewords have a Hamming distance of at least d . The Hamming distance between two codewords is the number of bits in which these codewords are different from each other. For example, the Hamming distance between codewords “10000101” and “10001111” is two. The number of error corrections in a code with a Hamming distance of d is g = ⎣(d − 1) / 2⎦ . We can also represent this g
code as Code(n,k) . For example, Hamming(7,4,3) encodes a 4-bit data word into a 7-bit codeword and is able to correct g=(3-1)/2=1 error in the codeword, and so it can be
Paper 7.1
represented as Hamming(7,4) . There are n-k=3 parity bits used in this code, so the parities can encode upto eight different error syndromes. The Hamming encoder is designed in a way that these three parity bits generate a unique code for a single error on any of the seven bits, and for the case when there is no error. This makes the Hamming code be able to correct any single error in the codeword. Table I shows the design of the Hamming(7,4,3). Table I: Hamming(7,4,3). Bit Position Endoded Data Bits p1 Parity bit p2 coverage p4
1 2 3 4 5 6 7 p1 p2 d1 p4 d2 d3 d4 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1
Counting the number of syndromes gives us a lower bound on the number of parity bits required for correcting a specific number of bit errors. For example, if the total number of error syndromes (including the case with no error) is e, we need at least ⎡log2 e⎤ parity bits to encode the error syndromes and therefore correct the errors. In the following section, we analyze the number of error syndromes for the general case of g-bit error correction.
2.2
Syndrome analysis
In general, for a code(n,k,d), the number of cases when no error occurs is equal to choosing zero out of n which results in C(n,0)=1. The number of cases when one error happens is C(n,1)=n. So the total number of syndromes for single error correction is n+1. Therefore, code(n,k,d) can correct a single error if the following inequality is satisfied: ⎛ n⎞ ⎛ n⎞ ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ ≤ 2( n − k ) ⎝ 0⎠ ⎝1⎠
For the special case when: ⎛ n⎞ ⎛ n⎞ ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ = 2( n − k ) ⎝ 0⎠ ⎝1⎠
the code is a perfect code, meaning that we utilize the maximum capacity of the parity bits to encode the syndromes. Hamming(7,4,3) is an example of a perfect code. To generalize the syndrome analysis, let A(g) be the total number of error syndromes in the code(n,k,d) for up to gbit error correction. For a SEC code, we showed that: ⎛ n⎞ ⎛n⎞ A(1) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ ⎝ 0⎠ ⎝1⎠
For a double-error-correcting code (DEC), we need to further distinguish all 2-bit errors, so the total number of syndromes would be: ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ A(2) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ ⎝ 0⎠ ⎝1⎠ ⎝ 2⎠
For the general case of up to g-bit error correction, we have:
INTERNATIONAL TEST CONFERENCE
3
2.3
⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ A( g ) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ... + ⎜⎜ ⎟⎟ ⎝0⎠ ⎝1⎠ ⎝ 2⎠ ⎝g⎠
For a code(n,k,d), we like to maximize:
And finally:
1. 2.
⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ A(n) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ... + ⎜⎜ ⎟⎟ = 2n ⎝n⎠ ⎝ 0⎠ ⎝1⎠ ⎝ 2⎠
Figure 2 plots A(g) versus g=0..n for n=7. This curve is a mathematical bound on all possible error correction codes that aim to correct up to g bits of random error in the codeword of size n=7. Figure 3 shows the same curve for n=23. We have indicated the famous SEC Hamming(7,4,3), and TEC Golay(23,12,7) in Figures 2 and 3 respectively. Golay(23,12,7) is another example of a perfect code; the number of error syndromes for up to three errors is exactly equal the encoding capability of its parity bits: ⎛ 23 ⎞ ⎛ 23 ⎞ ⎛ 23 ⎞ ⎛ 23 ⎞ ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ = 1 + 23 + 253 + 1771 = 2048 = 2( 23−12) ⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠
100 80 60
Number of Syndromes: A(g)
140 120
Fundamental dilemma of the coding theory
127
128
The code rate, defined as k/n; The relative Hamming distance, defined as d/n; [37]
Increasing the code rate reduces the amount of redundancy in the code, while increasing the relative Hamming distance increases the number of bit errors that can be corrected for a constant-size codeword. The fundamental dilemma of the coding theory simply states that the improvement of the code rate is against the improvement of the relative Hamming distance. Figure 4 illustrates an analogy to this dilemma adapted from [37]. In this analogy, packing a large number of codewords in the Hamming space is illustrated as packing balls of radius g = ⎣(d − 1) / 2⎦ in a finite space. The center of each ball represents a codeword. To be able to correct g errors, these balls shouldn’t overlap. In this analogy, improving the code rate is equivalent to increasing number of balls in the finite space which is against increasing the radius of the balls (i.e. the relative Hamming distance).
120 99
64
TEC(7,1,7)
40 29 20 8
1
0
0
SEC Hamming(7,4,3) 1
2
3
Number of Corrections: g 4
5
6
7
Figure 2. Number of error syndromes versus error correction capability of a code for n=7.
Figure 4. An analogy to the fundamental dilemma of the coding theory adapted from [37]. We can have either lots of small balls, or a few large balls, but we cannot have lots of large balls.
In Section 4, we propose a new definition for the coding problem and overcome the fundamental dilemma of the coding theory.
3.
3.1 Problem definition Definition 2: BurstCode(n,k)l encodes a k-bit data word into an n-bit codeword and is able to correct l local errors (i.e. a single burst error no longer than l).
100000 10000 1000
Number of Syndromes: A(g)
10000000 1000000
Golay(23,12,7)
100 10
Number of Corrections: g 1 0
Burst Error Correcting Codes
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Figure 3. Number of error syndromes versus error correction capability of a code for n=23 on a logarithmic scale.
Paper 7.1
Notice that there is no notion of Hamming distance in Definition 2. We can still measure the Hamming distance between any two codewords of a burst code, but as opposed to random error correcting codes, the minimum Hamming distance of the codewords in this case doesn’t limit the error correction capability of the code. For example, in a burst code that corrects four local errors in a codeword of size n=8, “01001111” and “00001110” could be two valid codewords while their hamming distance is just two.
INTERNATIONAL TEST CONFERENCE
4
The number of syndromes to correct upto two local errors, B(2), is equal to the syndromes of a single random error correction, A(1), plus the cases that two adjacent bits are erroneous. There are n-1 ways to choose two adjacent bits in a codeword of size n; therefore:
Golay code protects the memory against upto three SBU, but not a MBU of 4-bit or longer. On the other hand, BurstCode(23,12)5 protects the memory against MBU of upto 5-bit long, but not two distant SBUs (i.e. two single bit errors with more than five bits apart). Therefore, we need to have a code that makes the memory reliable under both error scenarios. 10000000 1000000 100000 10000
⎛ n⎞ ⎛ n⎞ B(2) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + (n − 1) = 2n ⎝0⎠ ⎝1⎠
1000
Number of Syndromes
3.2 Syndrome analysis Similar to the syndrome analysis described in the previous section, we can count the number of syndromes for the local errors to be distinguished and corrected. Let B(l) be the number of syndromes that have to be encoded by the parity bits in order to correct l local bit errors.
A(g) Golay(23,12)3
B(l)
BurstCode(23,12)5
In general, B(l) is equal to B(l-1) plus the error patterns that may occur on l adjacent cells that haven’t been covered in B(l-1). There are (n-l+1) ways to choose l adjacent cells out of n, and for these l adjacent cells to have an error proximity of exactly l, the first and the last bits of the error pattern should be ‘1’; otherwise, the error proximity would be less than l and so the error pattern would have been covered in B(l-1) calculation. The number of error patterns for l adjacent cells with both the l-2 first and the last bits being ‘1’ is 2 . Therefore, B (l ) = B(l − 1) + 2 l − 2 (n − l + 1)
Using the above inductive analysis, we can calculate B(l) for l=2..n. For the case l=n, we have: ⎛n⎞ ⎛n⎞ B( n) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + (n − 1) + 2(n − 2) + 2 2 (n − 3) + ... + 2 n −3 (2) + 2 n − 2 = 2 n ⎝0⎠ ⎝1⎠
This shows the consistency of our calculation: the total number of error patterns of proximity upto n that occurs on a codeword of size n is equal to the number of all possible n error patterns which is 2 .
100
BurstCode(23,16)3
10
Number of Corrections
1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Figure 5. Comparing B(l) with A(g) for n=23 and g=l.
4.
Error-Locally-Aware Correcting Codes
We generalize the definition of the coding problem by merging the concept of random (global) error correction with burst (local) error correction, and design errorlocality-aware codes to mitigate MBU and SBU in memories. Following Edmonds Approximation, it has been shown that it’s very unlikely that two particles hit the same codeword in the same scrub cycle [7, 12]. Therefore, we assume that there is no more than one MBU in one codeword in one scrub cycle, and thus, there is at most one cluster of local errors in a codeword. 4.1
Problem definition
Definition 3: ErrorLocal ityAwareCo de(n, k ) lg encodes a k-bit data word into an n-bit codeword and corrects upto g global or l local bit errors. l local bit errors mean that there are maximally l flipped bits in the codeword and they are no more than l bits apart from each other. For simplicity, we represent this code as gGlL(n,k).
Figure 5 plots B(l) and A(g) versus l=g=0..n for n=23. In this figure, B(l) is compared to A(g) where the horizontal axes is the number of global bit corrections for A(g) but the number of local bit corrections for B(l). Notice that both curves start from the same point and eventually merge at the same point. The difference between these two curves is due to the fact that, when g and l increase, the number of syndromes incrementally added to A(g) is more than that added to B(l) for small values of g and l. This figure shows that the B(l) is significantly smaller than A(g) for g=l. This means that if we design a code that corrects upto l local errors, it can have a significantly higher code rate than the code that corrects g=l global errors.
Note that by definition a code that corrects g global errors, also corrects g local errors. In fact, local errors are a special case of global errors, and hence, l should always be greater than or equal to g. If l=g, Definition 3 degenerates to Definition 1, and if g=1, it degenerates to Definition 2. This proves that Definition 3 is a generalization over Definitions 1 and 2.
In Figure 5, we illustrate the possibility of designing a BurstCode(23,16)3 code that corrects one global error and upto three local errors. This code has a code rate of 16/23 which is 33% higher than the code rate of the famous 3 Golay(23,12) which corrects three global errors.
4.2 Syndrome analysis Let B(g,l) be the number of syndromes that have to be encoded by the parity bits in order to correct g global and l local errors.
Another alternative is to design a BurstCode(23,12)5. With 3 the same code rate as Golay(23,12) , BurstCode(23,12)5 can correct one global error and upto five local errors.
We can calculate B(2,l) for l=2..n with the same approach as those for A(2) and B(l):
Paper 7.1
INTERNATIONAL TEST CONFERENCE
5
Similarly:
local errors by using 2G4L(26,16). Design of a 2G4L(26,16) also will be discussed in Section 5. 100000000
10000000
⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ B (3, n) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ( 2 2 − 1 − 2)(n − 3) + ( 2 3 − 1 − 3)(n − 4) + (2 4 − 1 − 4)(n − 5) ⎝ 0⎠ ⎝ 1⎠ ⎝ 2⎠ ⎝ 3⎠ + ... + (2 n − 3 − 1 − ( n − 3))(2) + (2 n − 2 − 1 − (n − 2)) = 2 n
1000000
100000
Number of Syndromes
⎛n⎞ ⎛n⎞ ⎛n⎞ B(2, n) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + (n − 2) + (2 2 − 1)(n − 3) + (2 3 − 1)(n − 4) ⎝0⎠ ⎝1⎠ ⎝2⎠ n −3 + ... + (2 − 1)(2) + (2 n −2 − 1) = 2 n
10000
⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ B(4, n) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ ⎝ 0⎠ ⎝1⎠ ⎝ 2⎠ ⎝ 3⎠ ⎝ 4⎠ ⎛3⎞ ⎛ 4⎞ + (2 3 − 1 − 3 − ⎜⎜ ⎟⎟)(n − 4) + (2 4 − 1 − 4 − ⎜⎜ ⎟⎟)(n − 5) + ... ⎝2⎠ ⎝ 2⎠ ⎛ n − 2⎞ ⎛ n − 3⎞ ⎟⎟) = 2 n ⎟⎟)(2) + (2 n − 2 − 1 − (n − 2) − ⎜⎜ + (2 n −3 − 1 − (n − 3) − ⎜⎜ ⎝ 2 ⎠ ⎝ 2 ⎠
⎛ n − 2⎞ ⎛ n − 2⎞ ⎛n − 2⎞ ⎟⎟ − ⎜⎜ ⎟⎟ − ... − ⎜⎜ ⎟⎟) = 2n + (2n − 2 − ⎜⎜ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ g − 2⎠
10000
Number of Syndromes
Figure 6 compares B(g,l) with A(g) for g=1..n, l=1..n, and n=23. This figure suggests that instead of using 3 Golay(23,12) that corrects three global errors, or BurstCode(23,16)3 that correct three local errors, we can use a 2G5L(23,12) to correct two global errors or upto five local errors to provide reliability against both random and burst errors. In Section 5, we will show the design of a 2G5L(23,12).
100000
B(22,l)
B(19,l)
B(18,l)
100
B(17,l) B(13,l)
B(16,l) B(12,l)
B(15,l) B(11,l)
B(14,l) B(10,l)
B(9,l)
B(8,l)
B(7,l)
B(6,l)
B(5,l)
B(4,l)
B(3,l)
B(2,l)
Number of Corrections
1 0
1
2
3
4
5
6
7
8
9
B(l)
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
4.3 Treatment for the fundamental dilemma Considering the error significance in the design of the code offers the opportunity of alleviating the fundamental dilemma of the coding theory. It is analogous to replacing the circles in Figure 4 with objects of a different shape, e.g. ovals or rectangles, such that more objects can be fit into the space and each of them is large enough to cover errors of greater significance. This concept is illustrated in Figure 8. In this figure, objects are no longer circles reflecting that errors of high significance, which are targets of the code, are not evenly distributed in the codeword space. In other words, we can improve both the code rate and the correction capability if we limit the code to correct only the errors which are more likely to occur, as opposed to blindly correcting all errors just based on their Hamming distances. In Figure 8, the width and height of the objects represent the local error correction and global error correction capability of the code respectively.
Global Correction
1000
100 10
Number of Corrections
1 0
B(23,l)
B(20,l)
Figure 7. Comparing B(g,l) with A(g) for g=1..n, l=1..n, and n=26.
⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ B ( g , n) = ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ... + ⎜⎜ ⎟⎟ 0 1 2 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝g⎠ ⎛ g − 1⎞ ⎛ g − 1⎞ ⎛ g −1⎞ ⎟⎟ − ⎜⎜ ⎟⎟ − ... − ⎜⎜ ⎟⎟)(n − g ) + (2 g −1 − ⎜⎜ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ g − 2⎠ ⎛ g ⎞ ⎛g⎞ ⎛g⎞ ⎟⎟)(n − g − 1) + (2 g − ⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎟ − ... − ⎜⎜ ⎝ g − 2⎠ ⎝0⎠ ⎝1⎠ ⎛ g + 1⎞ ⎛ g + 1⎞ ⎛ g +1⎞ + (2 g +1 − ⎜⎜ ⎟⎟ − ⎜⎜ 1 ⎟⎟ − ... − ⎜⎜ g − 2 ⎟⎟)(n − g − 2) + .... 0 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
1000000
B(24,l)
B(21,l)
10
Finally, for the most general case:
10000000
A(g) 1000
1
2
3
4
5
6
7
8
A(g) B(20,l) B(18,l) B(16,l) B(14,l) B(12,l) B(10,l) B(8,l) B(6,l) B(4,l) B(2,l)
B(21,l) B(19,l) B(17,l) B(15,l) B(13,l) B(11,l) B(9,l) B(7,l) B(5,l) B(3,l) B(l)
Local Error Correction
Figure 8. Overcoming the fundamental dilemma of the coding theory by replacing circles with ovals.
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Figure 6. Comparing B(g,l) with A(g) for g=1..n, l=1..n, and n=23.
Figure 7 compares B(g,l) with A(g) for n=26. This figure suggests that we can increase the code rate by 25% if we 2 use BurstCode(26,20)2 rather than BCH(26,16) , or with the same code rate, we can correct two global or upto four
Paper 7.1
Local Correction
5.
Exemplary Error-Locality-Aware Codes for SRAMs
The feasibility of designing a more efficient error-localityaware code for SRAMs was explored in the previous section. In this section, we design codes for 8, 12, 16, and 24-bit memories to correct double global errors as well as multiple (>=3) local errors.
INTERNATIONAL TEST CONFERENCE
6
5.1 Designing a 2G3L(16,8) In the coding table of SEC Hamming(7,4,3) (as depicted in Table I), the parity encoding of each column is unique. This assures that the parities can distinguish all single errors. Similarly, for the coding table of a 2G3L code, the following conditions should be satisfied: 1.
Each column is unique; this enables the code to correct one global error.
2.
The bitwise XOR of any two columns is unique; this enables the code to correct any two global errors.
3.
The bitwise XOR of any three adjacent columns are unique; this enables the code to correct three local errors.
within five adjacent cells. Using Edmonds Approximation [7, 12], we can show that the chance of having three single errors in the same codeword is far less than the chance of having one MBU of size four or five. Therefore, in practice, the 2G5L code provides stronger protection for SRAMs than Golay code. We propose using 2G5L(23,12) for 12-bit SRAMs. The implementation cost of 2G5L(23,12) in terms of area, power, and delay is the 3 same as that of Golay(23,12) . They both can be implemented using trees of XOR gates similar to the example of Figure 9.
d6
d5
d4
d3
d2
d1
XOR Tree
d8
d7
d5
d4
d3
d2 d3 d5
Table II shows an example of a 2G3L code which encodes an 8-bit data word of the memory into a 16-bit codeword. This code is one of the many possible codes that we can design as a 2G3L(16,8).
p1
d7
After finding a table that fulfills all requirements, hardware implementation is straight forward. In the encoder, each row of the table can be implemented using an XOR gate (or a tree of XOR gates). As an example, Figure 9 illustrates the corresponding encoder of the 2G3L(16,8) of Table II. This implementation approach is the same as the parallel implementation of Hamming codes (and basically any other codes). Implementation of the decoder is also similar to that of Hamming codes. For hardware implementation of a code, there is one more condition about the table that should be met: 4.
We should be able to sort the parity columns of the coding table (i.e. columns of the table that correspond to parity bits) into a triangular matrix. This condition ensures that each parity bit can be derived by XORing some of the data bits and previously calculated parity bits.
In all coding tables of Tables II-VI, the parity columns form an identity matrix (which is a special case of a triangular matrix). For these cases, all parity bits can be computed directly from the data bits in parallel which makes the hardware implementation faster. 5.2 Designing a 2G5L(23, 12) Table III shows the coding table for 2G5L(23,12). Similar 3 to Golay(23,12) , this code encodes 12 bits of data into a 23-bit codeword. Golay code is able to correct three global codes, while 2G5L is able to correct two global errors or upto five local errors. All errors that the Golay code can but 2G5L code cannot detect are very unlikely to occur. Those are the cases that the three erroneous bits are not
Paper 7.1
XOR Tree
p2
XOR Tree
p4
XOR Tree
p6
XOR Tree
p8
d1
XOR Tree
d2 p3
d8
d5 d7
d1
d2
XOR Tree
d3 d6
d4 p5
d7
d7 d8
d3
Finding a set of values for Table II to satisfy all three aforementioned conditions can be translated into a Boolean satisfiability problem and be solved using a SAT solver. The search space for finding such a solution is huge, so it is impossible to sweep the entire solution space by exhaustive search. We can use dynamic programming as well as heuristic approaches to speed up the search.
d1
d4 d6
d5
XOR Tree
d8
d6 p7
d7 d8
Figure 9. The corresponding encoder of the 2G3L(16,8) of Table II.
5.3 Designing a 2G4L(26, 16) For 16-bit memories, we suggest to use a 2G4L(26,16) to protect data from double global and quadruple local errors. 2 Similar to BCH(26,16) , 2G4L(26,16) encodes 16 bits of data into a 26-bit codeword. Therefore, these two codes have the same amount of redundancy, and equal hardware, 2 power, and delay cost. BCH(26,16) is a double error correcting code but provides no protection against burst errors. However, 2G4L(26,16), in addition to double global error correction, also corrects burst errors of upto 4bit long. It means that 2G4L(26,16) is always better than 2 BCH(26,16) , because it provides extra protection at no extra cost. 2
Tables IV and V show the design of BCH(26,16) and our proposed 2G4L(26,16) respectively. For example, in BCH code, if a burst error happens within the four adjacent bits of d11, d12, d13, and d14 (i.e. bit positions 21-24 in Table IV) the error code that the parities generate would be 532, that is equal to the case of a double-error flipping bits d1 and d6 (i.e. bit positions 11 and 16). That is,
247 ⊕ 494 ⊕ 988 ⊕ 209 = 873 ⊕ 381 = 532 where ⊕ is the bitwise XOR operator. In this case, the BCH decoder mistakenly recognizes the burst error as a double error on d1 and d6 and thus make wrong corrections. Whereas, in 2G4L(26,16), if a burst error occurs on d11, d12, d13, and d14, a unique error code would be generated, and the 2G4L decoder recovers data correctly. 5.4 Designing a 2G5L(36, 24) For a 24-bit memory, we propose a 2G5L(36,24) code to correct double global and up to five local errors. The design of this code is illustrated in Table VI.
INTERNATIONAL TEST CONFERENCE
7
All codes that we proposed in this section are exemplary codes that provide protection against both random and burst errors. Other codes of similar characteristics can be
found using a SAT solver. We suggest using codes which can correct double global errors and local errors of multiplicity greater than three for SRAM memories.
Table II: Design of our proposed 2G3L(16,8).
1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8
Bit Position Endoded Data Bits
p1 p2 p3 Parity bit p4 coverage p5 p6 p7 p8 Decimal
5 p5 0 0 0 0 1 0 0 0 16
6 p6 0 0 0 0 0 1 0 0 32
7 d1 1 1 0 1 1 0 0 0 27
8 d2 1 0 1 1 0 1 0 0 45
9 p7 0 0 0 0 0 0 1 0 64
10 d3 1 1 1 0 1 0 1 0 87
11 d4
12 p8
13 d5
1 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 99 128 143
14 15 16 d6 d7 d8 1 0 0 0 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 209 190 230
Table III: Design of our proposed 2G5L(23,12). Bit Position Endoded Data Bits
p1 p2 p3 p4 Parity bit p5 p6 co verage p7 p8 p9 p10 p11 Decimal
1 p1
2 p2
1 0 0 0 0 0 0 0 0 0 0 1
3 p3
0 1 0 0 0 0 0 0 0 0 0 2
4 p4
0 0 1 0 0 0 0 0 0 0 0 4
0 0 0 1 0 0 0 0 0 0 0 8
5 p5 0 0 0 0 1 0 0 0 0 0 0 16
6 p6 0 0 0 0 0 1 0 0 0 0 0 32
7 p7
8 p8
9 p9
10 p10
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 64 128 256 512
11 d1
12 d2
13 d3
14 d4
15 d5
16 p11
17 d6
18 d7
19 d8
20 d9
21 22 23 d10 d11 d12
1 1 1 1 1 0 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 1 0 0 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 99 165 239 311 575 1024 222 378 393 725 1063 818 1292
Table IV: Parallel implementation of BCH(26,16)2 adapted from [16]. Bit Position Endoded Data Bits p1 p2 p3 p4 Parity bit p5 coverage p6 p7 p8 p9 p10 Decimal
1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8
5 p5 0 0 0 0 1 0 0 0 0 0 16
Bit Position Endoded Data Bits p1 p2 p3 p4 Parity bit p5 coverage p6 p7 p8 p9 p10 Decimal
1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8
5 p5 0 0 0 0 1 0 0 0 0 0 16
6 p6 0 0 0 0 0 1 0 0 0 0 32
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 p7 p8 p9 p10 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 d16 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 1 64 128 256 512 873 443 886 389 778 381 762 669 595 975 247 494 988 209 418 836
Table V: The Design of our proposed 2G4L(26,16). 6 p6 0 0 0 0 0 1 0 0 0 0 32
7 8 p7 p8 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 64 128
9 d1 1 1 0 0 1 1 0 0 0 0 51
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 d2 d3 d4 p9 d5 d6 d7 d8 d9 p10 d10 d11 d12 d13 d14 d15 d16 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 85 127 173 256 283 309 327 413 420 512 579 541 679 910 1003 841 744
Table VI: The Design of our proposed 2G5L(36,24).
Paper 7.1
INTERNATIONAL TEST CONFERENCE
8
6.
Conclusion
In this paper, we generalize the definition of the coding problem to design error-locality-aware codes for correcting global (random) errors as well as local (burst) errors. The new definition offers a new perspective in the coding design space and enables the design of codes that suit our requirements. The coding problem can be translated into a Boolean satisfiability problem and be solved by a standard Boolean SAT solver. For SRAM memories of 8, 12, 24, and 32 bits wide, we propose errorlocality-aware codes that correct two global errors and local errors with a multiplicity of three or more. With the same code rate, these error-locality-aware codes offer stronger protection for memories against multi-bit upsets. For example, for a 12-bit memory, 2G5L(23,12) provides stronger protection against MBU than that of 3 Golay(23,12) . For 16-bit memories, we propose using 2 2G4L(26,16) which is superior to BCH(26,16) . With these proposed codes, SRAMs can be protected against a wider range of possible error sources including single bit soft errors, multi-bit upsets, and even physical defects missed by manufacturing testing or age-induced defects.
7.
Acknowledgement
The authors acknowledge the support of the National Science Foundation Center for Domain-Specific Computing (CDSC) and the Gigascale Systems Research Center (GSRC), one of six research centers funded under the Focus Center Research Program (FCRP), a Semiconductor Research Corporation entity.
8.
References
[1] International Technology Roadmap for Semiconductors (I.T.R.S), 2009 Edition, Test and Test Equipment, 2009. [2] C.A. Gossett, B.W. Hughlock, M. Katoozi, G.S. LaRue, and S.A. Wender, “Single event phenomena in atmospheric neutron environments,” in IEEE Transactions on Nuclear Science, vol. 40, no. 6, pp. 1845-1856, 1993. [3] W.R. McKee et al., “Cosmic ray neutron induced upsets as a major contributor to the soft error rate of current and future generation DRAMs,” in proc. IEEE International Reliability Physics Symposium (IRPS), pp. 1-6, 1996. [4] R. Baumann, “The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction,” in proc. IEEE International Electron Devices Meeting (IEDM), pp. 329-332, 2002. [5] R.W. Hamming, “Error correcting and error detecting codes,” in Bell Sys. Tech. Journal, vol 29, pp. 147160, April 1950. [6] M.Y. Hsiao, “A class of optimal minimum oddweight-column SECDED codes,” in IBM Journal of R & D, vol. 14, pp. 395-401, July 1970. [7] G.M. Swift and S.M. Guertin, “In-flight observation of multiple-bit upset in DRAMs,” in IEEE
Paper 7.1
Transactions on Nuclear Science, vol. 47, no. 6, pp.2386-2391, 2001. [8] C.W. Slayman, “Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations,” in IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 397–404, September 2005. [9] N. Derhacobian, V. Vardanian, and Y. Zorian, “Embedded memory reliability: The SER challenge,” in proc. IEEE International Workshop on Memory Technology, Design and Testing, pp. 104–110, Aug. 2004. [10] M. Spica and T. Mak, “Do we need anything more than single bit error correction (ECC)?,” in proc. IEEE International Workshop on Memory Technology, Design and Testing, pp. 111–116, Aug. 2004. [11] R.C. Baumann, “Radiation-induced soft errors in advanced semiconductor technologies,” in IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 305-316, September 2005. [12] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, “Characterization of multi-bit soft error events in advanced SRAMs,” in proc. IEEE International Electron Devices Meeting (IEDM), pp. 519-522, 2003. [13] D. Radaelli, H. Puchner, S. Wong, and S. Daniel, “Investigation of multi-bit upsets in a 150 nm technology SRAM device,” in IEEE Transactions on Nuclear Science, vol. 52, no. 6, pp. 2433–2437, December 2005. [14] P.J. Meaney, S.B. Swaney, P.N. Sanda, and L. Spainhower, “IBM z990 soft error detection and recovery,” in IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 419–427, September 2005. [15] R. Naseer and J. Draper, “Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs,” in proc. IEEE European Solid-State Circuits Conference (ESSCIRC), pp. 222-225, 2008. [16] R. Naseer and J. Draper, “DEC ECC design to improve memory reliability in sub-100nm technologies,” in proc. IEEE International Conference on Electronics, Circuits and Systems, (ICECS), pp. 586-589, 2008. [17] M.A. Bajura et al., “Models and algorithmic limits for an ECC-based approach to hardening sub-100-nm SRAMs,” in IEEE Transactions on Nuclear Science, vol. 54, no. 4, pp. 935-945, 2007. [18] C. Argyrides, H.R. Zarandi, and D.K. Pradhan, “Multiple upsets tolerance in SRAM memory,” in proc. IEEE International Symposium on Circuits and Systems (ISCAS), pp. 365-368, May 2007. [19] G. Neuberger, F. De Lima, L. Carro, and R. Reis, “A multiple bit upset tolerant SRAM memory,” in ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 8, no. 4, pp. 577-590, October 2003. [20] I.S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” in Journal of the Society for
INTERNATIONAL TEST CONFERENCE
9
Industrial and Applied Mathematics, vol. 8, no. 2, pp. 300-304, June 1960. [21] N.M. Abramson, “A class of systematic codes for non-independent errors,” in IRE Transactions on Information Theory, vol. IT-5, pp 150-157, Dec. 1959. [22] S.H. Reiger, “Codes for the correction of clustered errors,” in IRE Transactions on Information Theory, pp 16-21, March 1960. [23] E.N. Gilbert, “A problem in binary encoding,” in proc. of Applied Math Symposium, pp. 291-297, 1960. [24] B. Arazi, “The optimal burst-error correcting capability of the codes generated by f(x)=(Xp+1)(Xq+1)/(X+1),” in Information and Control, vol. 39, pp. 303-314, 1978. [25] P.G. Newmann, “A note on Gilbert burst correcting codes,” in IEEE Transactions on Information Theory, vol. IT-11, pp. 384, 1965. [26] R.L. Bahl and R.T. Chien, “On Gilbert burst-errorcorrecting codes,” in IEEE Transactions on Information Theory, vol. IT-15, pp. 431-433, 1969. [27] G.M. Tenengol'ts, A.A. Davydov, G.L. Tauglikh, “A burst error-correcting code and its application for information exchange between computers,” in Problemy Peredachi Informatsii, vol. 8, no. 3, pp. 2737, 1972. [28] C. Fujiwara, M. Kasahara, K. Tezuka, Y. Kasahara, “On codes for burst-error-correction,” in Transactions on IEICE, vol.53-A, no.7, pp.335-342, 1970. [29] W. Zhang and J.K. Wolf, “A class of binary burst error-correcting quasi-cyclic codes,” in IEEE Transactions on Information Theory, vol. 34, no. 3, pp. 463-479, May 1988.
Paper 7.1
[30] R.M. Roth and G. Seroussi, “Reduced-redundancy product codes for burst error correction,” in IEEE Transactions on Information Theory, vol. 44, no. 4, pp. 1395-1406, July 1998. [31] D. Raphaeli, “The burst error correcting capabilities of a simple array code,” in IEEE Transactions on Information Theory, vol. 51, no. 2, pp. 722-728, Feb. 2005. [32] M. Blaum, “A family of efficient burst-correcting array codes,” in IEEE Transactions on Information Theory, vol. 36, no. 3, pp. 671-675, May 1990. [33] P.G. Farrell, “A survey of array error control codes,” in European Transactions on Telecommunications, vol. 3, no. 5, pp. 441-454, Sept.-Oct. 1992. [34] S. Baeg, S. Wen, and R. Wong, “SRAM Interleaving distance selection with a soft error failure model,” in IEEE Transactions on Nuclear Science, vol. 56, no. 4, pp. 2111-2118, Aug. 2009. [35] A. Dutta and N. Touba, “Multiple bit upset tolerant memory using a selective cycle avoidance based SECDED-DAEC code,” in proc. IEEE VLSI Test Symposium (VTS), pp. 349-354, 2007. [36] G.-C. Yang, “Reliability of semiconductor RAMs with soft-error scrubbing techniques,” in proc. IEE Computers and Digital Techniques, vol. 142, no. 5, pp. 337–344, Sept. 1995. [37] A. Betten, M. Braun, H. Fripertinger, A. Kerber, A. Kohnert, and A. Wassermann, Error-Correcting Linear Codes, Classification by Isometry and Applications. Springer, ISBN 9783540317036. ISSN 1431-1550, 2006.
INTERNATIONAL TEST CONFERENCE
10