SEVA: A Soft-Error- and Variation-Aware Cache Architecture

1 downloads 0 Views 338KB Size Report
This pa- per proposes SEVA, an original soft-error- and variation- aware cache architecture. SEVA exploits SECDED to tol- erate variation-induced defects while ...
SEVA: A Soft-Error- and Variation-Aware Cache Architecture Luong D. Hung, Masahiro Goshima, Shuichi Sakai Graduate School of Information Science and Technology, The University of Tokyo Hongo 7-3-1, Bunkyo, Tokyo 113-8656, Japan {hung,goshima,sakai}@mtl.t.u-tokyo.ac.jp Abstract

rows/columns) to become impractical. Combination of a redundancy technique with ECC can tolerate a high degree of random defects [13][14]. A block containing a single defective cell can be repaired by ECC. At much lower probability, a block may contain multiple defective cells that exceed the error detection and/or correction capability of ECC. Only such a block is replaced by a redundancy element. With such a combined approach, a small number of redundancy elements can be sufficient.

As SRAM devices are scaled down, the number of variation-induced defective memory cells increases rapidly. Combination of ECC, particularly SECDED, with a redundancy technique can effectively tolerate a high number of defects. While SECDED can repair a defective cell in a block, the block becomes vulnerable to soft errors. This paper proposes SEVA, an original soft-error- and variationaware cache architecture. SEVA exploits SECDED to tolerate variation-induced defects while preserving high resilience against soft errors. Information about the defectiveness and data dirtiness is maintained for each SECDED block. SEVA allows only the clean data to be stored in defective (but still usable) blocks of a cache. An error occurring in a defective block can be detected and the correct data can be obtained from the lower level of the memory hierarchy. SEVA improves yield and reliability with low overheads.

It is cost-effective if the same ECC resource can be used to tolerate both defects and soft errors. However, while a defective cell in a block can be tolerated by SECDED, the block becomes vulnerable to soft errors. Soft errors occurring in the block could be left undetectable and/or uncorrectable. While the error detection and correction capability of ECC can be enhanced by using codes that are more powerful than SECDED, these codes incur significant overheads and are impractical for being implemented in high-speed SRAMs. Previous work therefore improves defect tolerance at the expense of degraded soft error tolerance [13][14]. This paper proposes SEVA, a Soft Error and Variation Aware cache architecture. SEVA exploits SECDED to tolerate variation-induced defects while preserving high resilience against soft errors. Information about the defectiveness and data dirtiness is maintained for each SECDED block. SEVA allows only clean data to be stored in defective (but still usable) blocks. Soft errors cannot cause integrity problem in these blocks because SECDED is still able to detect the errors. When soft errors are detected, the correct data can be obtained from the lower levels of the memory hierarchy. SEVA improves yield and reliability with modest costs.

1 Introduction SRAM designs are confronted with two serious problems: soft errors and variation-induced defects. Soft errors refer to radiation-induced transient errors [4]. Soft error rate (SER) per bit of SRAM is expected to stay steady over process generations [1]. However, device scaling allows the number of bits integrated on a chip to increase exponentially, rising total SER rapidly. Tolerance against soft errors is highly required. Error Correcting Code (ECC)— particularly, Single Error Correction Double Error Detection Hamming code (SECDED)—is widely employed to detect and correct soft errors in SRAMs. Process variation causes spreads in the electrical characteristics of scaled devices. The effect is pronounced in SRAMs where minimum-geometry transistors are used. The number of variation-induced defective memory cells becomes high with device scaling [6][2], leading the conventional redundancy techniques (e.g., using redundancy

The rest of this paper is organized as follows. Section 2 discusses the existing techniques for tolerating soft errors and defects in SRAMs, as well as their limitations. Section 3 describes SEVA architecture. Section 4 presents defect and yield analysis for a SEVA cache. Section 5 presents the performance and reliability evaluations. Finally, Section 6 concludes the paper. 1

2 Preliminary

Table 1. Variation of Vth over process generations Technology (nm) 65 57 50 45 36 Vth (V) 0.18 0.17 0.16 0.15 0.14 σVth (mV) 19 21 22 24 27

This section covers soft errors and variation-induced defects as well as existing techniques to mitigate them. In this paper, a defect refers to a permanent hardware fault while an error refers to a soft error. A block refers to either an ECC dataword (including information bits and check bits) or the storage portion holding such data.

2.2

2.1

2.2.1 Variation-induced Defects

Soft Error Tolerance in SRAMs

Variation-induced Defects and Defect Tolerance in SRAMs

As process scales down, precise control of various device parameters such as gate width, gate length, number and locations of dopant atoms in manufacturing becomes increasingly difficult. This results in considerable variations in the electrical characteristics of devices. Table 1 shows the variation of threshold voltage (Vth ) over process generations, predicted by ITRS [1]. Particularly, fluctuations in the number and locations of dopant atoms in the channel region of a transistor are known as the dominant source of Vth variation in scaled devices [8][15]. Variation is pronounced in SRAMs where minimumgeometry transistors are used. Figure 1 shows the schematic of an SRAM cell. The cell consists of two cross-coupled CMOS inverters and two N-type access transistors. Large mismatches in the strengths of transistors in an SRAM cell can make the cell fail to function. Three types of failures can occur in an SRAM cell: read, write, and access time failures.

A collision of a high energy particle (e.g., alpha, neutron) with silicon atoms induces charges. Enough charges collected at the diffusion nodes of a gate can alter the gate output. Particle strikes in SRAMs can result in bit upsets. While Soft Error Rate (SER) per bit is expected to stay steady over process generations [1], device scaling allows the number of bits that can be integrated on a chip to increase exponentially and rises total SER rapidly. Tolerance against soft errors in SRAMs is highly required. Thanks to the high regularity of memory arrays, soft errors in SRAMs can be effectively tolerated by using coding techniques. Parity can be used in places where error detection is sufficient (e.g., instruction caches, write-through data caches). ECC, particularly SECDED, is widely used to detect and correct soft errors in caches holding dirty data (e.g., write-back caches). SECDED can correct single-bit errors and detect double-bit errors. The encoding and decoding circuitry of SECDED can be constructed as simple XOR trees and is quite fast.





While a single-bit error (SBE) in a block can be detected and corrected successfully by SECDED, there is a possibility of a multi-bit error (MBE) occurring in a block. MBE can be caused by 1) accumulation of multiple single-bit errors over time or 2) a particle strike corrupting multiple bits at once. We call the former case temporal MBE, and the later case spatial MBE. Since soft errors are infrequent events and SRAMs are frequently accessed, an error caused by a previous strike is very likely to be detected and corrected before next strike occurs. The probability of temporal MBE is therefore very low. The probability of a particle strike resulting in a spatial MBE increases with shrinking in device dimensions and supply voltages [10]. Since the error bits of a spatial MBE are located closely to each other, combination of ECC and an interleaving technique can tolerate spatial MBEs [11]. By interleaving different blocks on the same row, the error bits are dispersed over multiple blocks. Each error bit belongs to a different block and therefore can be detected and corrected successfully by SECDED.

B=0 BL



 



 B =1



 

BL

Figure 1. Schematic of an SRAM cell Read Failure. Read failure is defined as the flipping of data while reading an SRAM cell. The voltage of node B (the node storing 0 in Figure 1) VB is raised from zero to Vread due to a voltage divider between BL (precharged at V DD) and GN D through T 1 and T 3. If Vread is larger 2

Table 3. Number of check bits required for different information-bit lengths Information-bit length 32 64 128 SECDED 7 8 9 BCH DEC 12 14 16

Table 2. Probability of failure cell versus σVth in 45nm process σVth (mV) PRF PW F PAF PF ault 20 1x10−4 1x10−4 1x10−4 1x10−4 30 2x10−4 2x10−4 5x10−4 8x10−4 40 1x10−3 6x10−4 3x10−3 4x10−3

Conventional redundancy techniques usually use rows and/or columns as the redundancy elements. However, high defect densities in scaled processes makes such coarse-grain redundancy techniques impractical. In [2], a row contains several cache lines and an individual cache line, rather than an entire row, can be replaced through modifications to the column multiplexers. Maintaining redundancy at a word level (e.g., 32-bit) allows more efficient utilization of redundant resources [5][12]. Nevertheless, such fine-grain redundancy techniques are still inadequate for high defect densities. For instance, assuming the defect probability per cell of 0.001, a 256-KB SRAM will have roughly 2K defective words. The BISR and decoder circuitry to support the replacement of such a high number of defective words is complex. Particularly, the Content-Addressable Memory (CAM), which is usually used in BISR to store the addresses of the defective words, becomes excessively large and degrades the access latency. There have been proposals that combine a redundancy technique with ECC to tolerate a high number of random defects [13][14]. ECC, usually SECDED, is maintained for each data block. In the case where a defective cell is present in the block, the defective cell can be corrected by SECDED. Only in the case there are more than one defective cell present in the same block, the block is replaced by a redundancy element. The probability of the later case is much lower than the former case. The number of redundancy elements can be kept at low.

than the tripling voltage Vtrip of {T 4,T 6} inverter, the cell flips resulting in a read failure. Variations in Vth of T 1 and T 3 (or T 6 and T 4) lead to large variation in Vread (or Vtrip ). Write Failure. Write failure is defined as the inability to successfully write to a cell. When writing 0 to node B which originally stores 1, VB develops to Vwrite due to a voltage divider between BL at GN D and V DD through T 2 and T 6. If Vwrite is larger than Vtrip of {T 3,T 5} inverter, the write will fail. Since T 2 and T 6 (also T 1 and T 5) are typically the smallest transistors in the cell, Vth variations in these transistors causes large variation in Vwrite , resulting in a high probability of write failure [6]. Access Time Failure. The access time (Taccess ) is defined as the time required to develop a predefined voltage difference between BL and BL. When node B stores 0, BL will discharge through T 1 and T 3 in a read operation. The discharge speed depends on the strengths of T 1 and T 3. Vth variations in these transistors cause a spread in Taccess . An access fails if Taccess is larger than the maximum tolerable limit Tlimit . Table 2 shows the failure probability of a cell at 45nm process with different amount of Vth variations [2]. PRF , PW F , and PAF are respectively the probabilities of read, write, and access failures. PF ault is the total failure probability. The failure probability is quite high and highly sensitive to Vth variation. 2.2.2 Defect Tolerance Techniques

2.3

Redundancy techniques have been used to tolerate manufacturing defects in SRAMs to improve yield. SRAMs go through a test process after fabrication. A test algorithm applies input data in particular patterns to the SRAMs, and diagnoses the output data to locate defects. Common defects in SRAMs are malfunction cells, stuck-at faults, and bridging faults. When the defects have been identified, a repair algorithm is executed to replace defective locations with redundancy elements. Traditionally, external equipments are extensively used to test the chips, to localize defects, and to drive a laser beam to perform repair, or to blow fuses or anti-fuses. However, external test & repair method is costly. Modern SRAMs typically includes Built-In Self Test (BIST) and Built-in Self Repair (BISR) to maintain reasonable test and repair cost [17].

Limitation of Existing Tolerance Techniques

Defect tolerance and soft-error tolerance are the two topics that are usually treated separately so far. ECC has proved to be effective for tolerating either high defect densities or soft errors. It is cost-effective if the same ECC resource can be used for both purposes. However, while a defective cell present in a block can be tolerated by SECDED, the block becomes vulnerable to soft errors. A single-bit error in the block is detectable but uncorrectable. A multi-bit error in the block can be undetectable, or detectable but incorrectly corrected. Error detection and correction capability can be improved by using codes that are more powerful than SECDED. For instance, if Double Error Correction Code 3

(DEC) is used, a defective cell present in a block can be tolerated while a single-bit error occurring in the block can be successfully detected and corrected. Some conventional DEC codes described in the literature are Bose-ChaudhuryHocquenghem (BCH), Reed-Solomon [16]. However, their overheads are considerably larger than those of SECDED. Table 3 shows the number of check bits required to implement SECDED and BCH DEC for various information bit lengths. DEC increases number of check bits significantly. Moreover, the encoding and decoding circuitry of DEC usually employs multi-bit Linear Feedback Shift Registers (LFSR) which introduce inordinate access delay, making DEC unsuitable for being implemented in SRAMs requiring fast access. Therefore, it is important to use SECDED as ECC of choice. Existing work[13][14] making use of SECDED have limitation that they improve defect tolerance at the expense of degraded tolerance against soft errors. SEVA can overcome such a limitation. SEVA utilizes SECDED to tolerate variation-induced defects while preserving high resilience against soft errors.

can be obtained from the lower levels of the memory hierarchy. Data integrity problem occurs if the corrupted data are dirty data which have no backup elsewhere. SEVA prevents such a data integrity problem by storing only clean data in t-blocks. When there is a data update coming from a upper level cache (or processor) to a t-block of a cache, the data is also updated to the next level of the memory hierarchy. Such an update is referred to as an assurance update. An assurance update also occurs if modification goes to a cache line which the block holding the cache line’s tag is a t-block. Assurance updates increase the number of accesses to the next level of the memory hierarchy and have impacts on performance and power consumption. In conventional caches, the dirtiness of data is maintained at cache line level. When the cache line is written back, all the constituent blocks are written back no matter whether the blocks have been modified or not. By maintaining the dirtiness of data at block level, unnecessary writebacks of unmodified blocks can be eliminated. In SEVA, each block is associated with a d-bit. The d-bit of a block is set (or reset) if the block stores dirty (or clean) data. When the cache line are written back, only those blocks having their d-bits set are sent to the next level of the memory hierarchy. The d-bits are reset to indicate that the data in the blocks are now clean. By reducing the number of blocks to be written back, the probability of assurance update triggered by the next level of the memory hierarchy can also be reduced.

3 SEVA Architecture A SEVA cache consists of several subarrays. A subarray has several rows. Each row is comprised of several blocks. Each subarray has its own decoders, BIST, BISR, and some redundancy rows. BIST detects the defective cells in the subarray. Blocks are classified based on the number of defective cells they contain. A block that does not have any defective cell is called a good block (g-block). A block that has a defective cell is called a tolerable block (t-block). A block that has more than one defective cells is called a bad block (b-block). The row containing at least one b-block is a bad row. BISR replaces bad rows with redundancy rows. SEVA associates each block with a g-bit. The g-bit of a block is set (or reset) if the block is a g-block (or t-block). Defect analyzing and setting of g-bits are performed by BIST every time the system is booted. Alternatively, the overheads can be reduced by storing the values of g-bits into a non-volatile storage and reloading these values into g-bits at rebooting. SEVA interleaves multiple blocks in the same rows to disperse the error bits of a spatial MBE. We assume that a block contains as much as one error bit and focus on how to deal with a single-bit error in a block. If the block is a gblock, SECDED can be exclusively used for soft error tolerance. Therefore, the error is detectable and correctable. However, if the block is a t-block, a part of SECDED capability is used for repairing the defective cell, leaving the reduced capability for soft error tolerance. The error in this case is detectable, but uncorrectable. If the corrupted data is clean, error detection is sufficient since the correct data

Figure 2 shows a simplified architecture of a 256KB, four-way set-associative SEVA cache. The cache line size is 64B. The cache is constructed from a tag subarray and eight 32-KB data subarrays. Each data subarray has 512 rows, each row is comprised of eight 72b blocks. The tag subarray has 256 rows, each row is comprised of sixteen 45b blocks. A tag entry has a tag address, some state bits (for LRU replacement, cache coherency), and g-bits and dbits of all blocks belonging to the corresponding cache line. The tag access usually proceeds the data access in a typical cache. By storing the g-bits and d-bits of the blocks of the data array in the corresponding entry of the tag array, the decisions of 1) whether an assurance write is needed or not upon a cache write, and 2) which blocks are dirty that need to be written back upon a line eviction or an assurance update, can be determined at the end of the tag access. Accesses to unmodified blocks in a cache line can be skipped, thereby saving power consumption. Interleaving the tags of multiple sets in the same row in the tag subarray tolerates MBE while allowing the tags of the same set to be read concurrently on an access. 4

14-b tag 6 state bits 9 g-bits 9 d-bits 7 check bits a tag entry

38 info bits 45b

45b

180b

45b

180b

45b 180b

64 info bits

tags of the same set 180b

72b

72b

144b

144b

144b

sub-cachelines a row

decoder

decoder

spare rows

BISR

144b

a row

256 rows

BIST

8 check bits a SECDED block

512 rows

BIST

spare rows

BISR 576b

720b

data subarray #0

tag subarray

data subarray #7

Figure 2. Architecture of 256KB SEVA cache

4 Defect and Yield Analysis

expressed as Y ieldarray = P rob(Nb−row ≤ Nnb−rdrow )

In this section, we perform defect and yield analysis of an SEVA cache. Our analysis focuses on variation-induced defects, which are the dominant type of defects in scaled devices (redundancy rows can also be used to deal with other types of defects such as stuck-at faults at wordlines, decoders). We assume the defects are randomly distributed and λ is the probability that a cell is defective. Since an SEVA cache consists of multiple subarrays, we start with defect and yield analysis of a generalized subarray. The subarray has Nrow rows. Each row contains Nblk blocks. Each block has BS bits, including information bits and check bits. The subarray has Nrdrow redundancy rows. The probabilities that a block is a g-block, t-block or bblock are respectively Pg−blk , Pt−blk , and Pb−blk , and are given by Pg−blk Pt−blk

= (1 − λ)BS = BS(1 − λ)BS−1 λ

(1) (2)

Pb−blk

= 1 − Pg−blk − Pt−blk

(3)

We can simulate the yield of the subarray by assuming that Nb−row and Nnb−rdrow follow Poisson distribution which means are given by

= 1 − (1 − Pb−blk )Nblk

Nb−row = Nrow Pb−row

(6)

Nnb−rdrow = Nrdrow (1 − Pb−row )

(7)

The yield of the cache is the product of the yield of the subarrays. Y ieldcache =

4.1

all subarrays Y

Y ieldsubarray

(8)

Results

We perform yield simulation for the SEVA cache shown in Figure 2. The yields of the tag and data subarrays are calculated separately. The configuration parameters {Nrow , Nblk , BS} of the tag and data subarrays are respectively {256, 16, 45} and {512, 8, 72}. The defect probability is varied from 0.00005 to 0.001. We also vary the number of redundancy rows to investigate its impact on the yield of the subarray. The yields of the tag and data subarray are respecitively shown in the left and right halfs of Figure 3. When defect probability is low, the subarrays can archieve

The probability that a row is a bad row, Pb−row , is given by Pb−row

(5)

(4)

The array is passable if the number of bad rows, Nb−row , is at most equal to the number of redundancy rows that are not bad rows, Nnb−rdrow . The yield of the array can be 5

Table 4. Yield and Overhead of the subarrays and the cache λ

0.00005 0.0001 0.0005 0.001

Tag subarray (yield/Nrdrow ) 0.993/1 0.996/3 0.996/10 0.994/21

Data subarray (yield/Nrdrow ) 0.996/2 0.995/3 0.997/11 0.994/25

Cache yield 0.962 0.950 0.949 0.947

Overhead (%)

Frequency Functional Units

7.74 7.99 9.76 12.81

LSQ size RUU size Issue Width L1 i-cache

high yield with a very few rows. As the defect probability becomes high (0.0005 or 0.001), the number of redundancy rows required to achieve adequate yield increases appreciably. Table 4 lists the yield and number of redundancy rows in the tag and data subarrays in order to obtain roughly 95% cache yield. The hardware overhead of SEVA is also shown in the table. The overhead is simply the total number of SECDED check bits, g-bits, d-bits, and the bits consumed by redundancy rows, divided by the number of memory bits in the original cache. The overhead of SECDED is constant and the overhead of redundancy rows grows up with the increase in the defect probability. Even when the defect probality is as high as 0.001, SEVA can achieve 95% yield with 12.81% overhead.

L1 d-cache L2 cache Memory

Processor Parameters 1 GHz 4 integer ALUs, 4 FP ALUs 1 integer multiplier/divider 1 FP multiplier/divider 8 instructions 16 instructions 4 instructions/cycle Memory Hierarchy Parameters 16 KB, direct-map, 32B line, 1 cycle latency 72b SECDED block 16 KB, 4-way, 32B line, 1 cycle latency, writeback 72b SECDED block, 2-entry writeback buffer 256KB, unified, 4-way, 64B line, 6 cycle latency 72b SECDED block, 2-entry writeback buffer 100 cycle latency

• Baseline system: The L1 data cache and L2 cache are normal caches which do not perform assurance update and maintain dirtiness per cache line. • SEVA system: The L1 data cache and L2 cache are SEVA caches which perform assurance updates and maintain dirtiness per block. Cycle-accurate processor simulation is performed using SimpleScalar toolset [7]. SPEC2000 benchmarks are used in the simulation. For each benchmark, we skip the first one billion instructions and simulated the next four billion instructions. We assume that the soft error rate of an unprotected SRAM equal to 1.6 KFIT1 per megabit [3], and soft errors follow a uniform distribution. Accesses to the cache lines (in baseline system) or blocks (in SEVA system) of the caches are recorded. Such information allows us to calculate the error rates of the caches.

5 Performance and Reliability Evaluation The applications of SEVA caches on a processor’s memory hierarchy are considered in this section. The performance overhead and reliability improvement are evaluated.

5.1

Table 5. Parameters of Simulated Architecture

Evaluation Methodology

The simulated system is an out-of-order superscalar processor which configuration parameters are shown in Table 5. The processor has 16-KB instruction and data caches, and a 256-KB unified L2 cache. The data cache and L2 cache are writeback caches, each equipped with a two-entry writeback buffer. Upon a line eviction or an assurance update, the data is temporarily stored in the writeback buffer. The buffer will write back the data when the bus to the next level of the memory hierarchy is free. SECDED is maintained for every 64 information bits in the L1 caches and L2 cache. We assume that the defect probability of an SRAM cell is 0.001. We also assume that all bad rows in the caches are replaced by redundancy rows. The probability that a block is a t-block is calculated using Equation 1 and 2. The g-bits are randomly initialized at the beginning of the simulation based on such a probability. We consider only the defects on SRAM caches and assume the memory to be fault-free in the evaluation. The following two systems are evaluated:

5.2

Evaluation Results

Figure 4 shows the relative performance of the baseline system and SEVA system. The assurance updates of an SEVA cache consume the access bandwidth to the next level of the memory hierarchy and impact performance. However, the performance degradation is very small (less than 0.1%). It is thanks to the presence of the writeback buffer which effectively removing an assurance update from the critical path of a cache access. Interestingly, SEVA system improves performance for mcf benchmark. The improvement in performance in this case can be attributed to early writeback effect [9]. If the writeback buffer is full and a dirty line must be evicted in a cache miss, an entry in the 1 One

6

FIT (Failure In Time) corresponds to one failure per 109 hours



   







   



 

 

 

 





 









   



 

 

 

 

 

 







 

 

   





 

 

   



Figure 3. The yields of the tag subarray (left-half) and data subarray (right-half)

buffer must be written back and such a writeback adds to the latency of the cache access. Early writing back the dirty lines through assurance updates can reduce such overheads.  



       

Table 6. Uncorrectable error rate of 256KB L2 caches Cache Uncorrectable Error Rate (FIT) Baseline 631 SEVA 3.84e-14

  

   

        

the cache line is read later. For the SEVA cache, an uncorrectable error occurs if two strikes occur on the same t-block between two consecutive accesses to the block. The error rate of the SEVA cache is many orders of magnitude lower than that of the baseline L2 cache.

   )+  . ,                 !       " #   $&%  '  (  ) *  %  #    %' ! '-, )    !!

Figure 4. Performance degradation of SEVA system, compared to the baseline system

6 Conclusions Tolerating soft errors for high reliability and tolerating variation-induced defects for yield improvement are highly required in advanced SRAM designs. The proposed SEVA cache architecture can satisfy both the requirements. By combining SECDED with a redundancy technique, SEVA can effectively tolerate high defect densities. Enforcement of assurance update mechanism aided by the newly added g-bits and d-bits efficiently eliminates the uncorrectable soft errors occurring in defective (but still usable) blocks. The evaluation results verified the effectiveness of SEVA caches. Even with high defect probability (as high as 0.001), SEVA caches can achieve high yield with low area overheads. The performance overheads caused by assurance updates in SEVA caches are very low.

Figure 5 shows the average number of accesses to L2 caches and memory per one thousand instructions. On average, SEVA system increases the numbers of accesses to the L2 cache and memory respectively 11.3 and 6.4%. The breakdowns of the accesses are also shown in the figure. Assurance updates and writebacks account for 25.5% (or 27.2%) of accesses to the L2 cache (or memory). By allowing the L1 data cache (or L2 cache) to send only the dirty blocks to the L2 cache (or memory), the number of blocks can be reduced by 44% (or 60%), as compared with the case all the blocks in the cache lines are sent back to the L2 cache (or memory). The results confirm the effectiveness of maintaining data dirtiness per block in SEVA caches for eliminating unnecessary block updates and improving power efficiency. Table 6 shows the uncorrectable error rate of the L2 caches. For the baseline cache, an uncorrectable error occurs if a strike occurs on a t-block of a dirty cache line and

Acknowledgement This research is partially supported by Grant-in-Aid for Fundamental Scientific Research B(2) #13480077, B(2) 7

     

 

    

   

 

aNaNbb ceconpdMf[i fhq rsgNmNi tudkdMjNlPlNv m q b c[wkaPb cyxeq f[f

 

       

0/ 0 1 / 13 2 /54 6 7 8 9/ 6 0 6 2 /

3 4 ; 2 ? 1 A / 5 B 6 C D @ 3 > 1 0 A @ 0 : < 81< @4 : :

EA 5 < C F C54 6 G F 1 4

HH IKIWVXJMLNQ LPY Z[ORURQ \NJPJPSRTRTR] U H I_^`Y LPL

         

/0 0 1

/ 1 3 2 /5 4 6 7 8 9 / 6

60 2 / 0 :3 4; 2 3< 0 ?1 @ 4A/5 B 6 C D : @ 3> 1 0 A @ : < 81

E CA 5 6 G F 4 5 < F C4 1

Figure 5. Accesses to L2 cache and memory per 1,000 instructions

#16300013 from the Ministry of Education, Culture, Sports, Science and Technology Japan, a CREST project of the Japan Science and Technology Corporation, and by a 21st century COE project of the Japan Society for the Promotion of Science.

[10] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. Characterization of Multi-bit Soft Error events in advanced SRAMs. In Proc. IEEE International Electron Devices Meeting, pages 519–522, 2003. [11] K. Osada, Y. Saitoh, and K. Ishibashi. 16.7-fA/Cell TunnelLeakage-Suppressed 16-Mb SRAM for Handling CosmicRay-Induced Multierrors. IEEE Journal on Solid State Circuits, 38(11):1952–1957, 2003. [12] V. Schober, S. Paul, and O. Picot. Memory Built-in SelfRepair using Redundant words. In Proc. IEEE International Test Conference, pages 995–1001, 2001. [13] C. H. Stapper and H.-S. Lee. Synergistic Fault-Tolerance for Memory Chips. IEEE Transactions on Computers, 41(9):1078–1088, 1992. [14] C.-L. Su and Y.-T. Yeh. An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement. In Proc. IEEE International Test Conference, pages 81–89, 2005. [15] X. Tang, V. K. De, and J. D. Meindl. Intrinsic MOSFET Parameter Fluctuations Due to Random Dopant Placement. In Proc. IEEE International Electron Devices Meeting, pages 369–376, 1997. [16] T.R.N.Rao and E. Fujiwara. Error-Control Coding for Computer Systems. Prentice Hall, Inc, 1989. [17] Y. Zorian. Embedded infrastructure IP for SOC yield improvement. In Proc. IEEE/ACM Design Automation Conference, pages 709–712, 2002.

References [1] International Technology Roadmap for Semiconductor. http://public.itrs.net, 2005. [2] A. Agarwal, B. C. Paul, S. Mukhopadhyay, and K. Roy. Process Variation in Embedded Memories: Failure Analysis and Variation Aware Architecture. IEEE Journal on Solid State Circuits, 40(9):1804–1814, 2005. [3] R. Baumann. Soft Errors in Advanced Computer System. IEEE Design and Test of Computers, 22(3):258–266, 2005. [4] R. C. Baumann. Soft Errors in Advanced Semiconductor Devices–Part I: Three Radiation Sources. IEEE Transactions on Device and Materials Reliability, 1(1):17–22, 2001. [5] A. Benso, S. Chiusano, G. D. Natale, P. Prinetto, and M. L. Bodoni. A family of self-repair SRAM cores. In Proc. IEEE International On-Line Testing Workshop, pages 214– 218, 2000. [6] A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann, Q. Ye, and K. Chin. Fluctuation Limits & Scaling Oppoturnities for CMOS SRAM Cells. In Proc. IEEE International Electron Devices Meeting, pages 659–662, 2005. [7] D. Burger and T. Austin. The SimpleScalar Tool Set. Technical Report CS-TR-1997-1342, University of WisconsinMadison, 1997. [8] D. J. Frank, Y. Taur, M. Ieong, and Hon. Monte Carlo Modelling of Threshold Variation due to Dopant Fluctuations. In Proc. IEEE International Electron Devices Meeting, pages 93–94, 1999. [9] H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager writeback – a technique for improving bandwidth utilization. In Proc. IEEE/ACM International Symposium on Microarchitecture, pages 11–21, 2000.

8

Suggest Documents