Fault Tolerant FPGA

0 downloads 0 Views 437KB Size Report
the most popular methods in fault detection and fault tolerance in FPGA based .... elements and change their logic state. ... demonstrated their significant impact to processor lifetime. .... The Roving STARs based fault tolerance method. [22]:.
Fault Tolerant FPGA: A survey Mazdak Fatahi1, Arash Ahmadi2 1 2

Computer Engineering Department, Faculty of Engineering, Razi University, Kermanshah, Iran Electrical Engineering Department, Faculty of Engineering, Razi University, Kermanshah, Iran

Abstract: Using of Field Programmable Gate Array (FPGA), is growing up in many aspects of technology. Implementation of embedded systems (embedded processors, system-on-chip, network-on-chip) in critical applications are some of the most important fields of using of FPGAs. Some applications such as life-support equipment, deep-space missions, and nuclear weapons control systems, are very complex and critical. Reliability and availability are serious issues for this applications. Fortunately FPGAs have the ability to reconfiguration, therefore provide opportunities to overcome some of these issues. This survey attempts to investigate some of the most popular methods in fault detection and fault tolerance in FPGA based systems.

Keywords: FPGA, Fault, Fault Tolerant. 1.

Introduction

The evolution of integrated circuits reduce system complexity and manufacturing cost, and improved performance. Production of Application Specific Integrated Circuits (ASICs) and Application Specific Standard Products (ASSPs) involves cost and risk but only for mass production the cost per chip is mainly reduced. One of the disadvantages of ICs is time to market increment, because of increased design time. There is two types of costs in development of custom ICs: • Cost of development and design • Cost of manufacture [1]. Because of this costs, the best alternative for ICs are programmable platforms, and specifically FieldProgrammable Gate Arrays (FPGAs). FPGAs have enough flexibility and integrity and they can satisfy the product’s requirements (cost, power, performance, and density), also with the help of CAD tools circuits could be implemented in very shorter time (no physical layout process, no mask making and no IC manufacturing). Unfortunately there is some disadvantages. For example FPGA-based designs, consume more power, as compared to the rest hardware implementations (e.g., ASICs, ASSPs). This power consumption can impose thermal stress to FPGA’s components and consequently it can cause malfunction in the FPGA-based designs. Also the parts of a FPGA (like CLBs and IOBs) are connected by the programmable interconnections, consequently vulnerability (such as stuck-at, open and bridging faults) to interconnections is highly possible. In addition, the

current nanometer technology, unfortunately, decreases the reliability in operation [15]. Therefore, fault detection and diagnosis tests are essential to ensure that the design is fault-free before using them in main application, also it is necessary to detect ,locate and then recover the faults, in life time of the application(at runtime).Because of the nature of VLSI systems, fault occurrence is very common in these systems. FPGAs as an example of VLSI systems, are involved with different type of faults. This survey attempts to investigate recently methods in fault detection and fault recovering in FPGA based systems. The rest of the paper is organized as follows: section 2 describes the FPGAs and FPGA Architecture, section 3 explains Fault and Fault Detecting concept briefly. Section 4 introduces a number of Fault Recovery and Fault Tolerance methods, section 5 presents the Conclusion and finally Discussion and future works are presented in section 6. 2.

FPGAs and FPGA Architecture

Field Programmable Gate Array (FPGA) is a VLSI product that consists of a number of programmable resources that can be configured and reconfigured to implement any logical function. In Field Programmable Gate Arrays (FPGAs) configuration at runtime, specific parts of the FPGA can be configured without affecting operation in other regions. This ability is result of partial runtime reconfiguration, which provides a high degree of flexibility and efficiency in FPGAs [4, 7]. If we want to have an overview of evolution of FPGA, we can purpose PROM as the first type of programmable chip that could implement logic circuits. In using of PROM, the address lines of PROM can be used as logic circuit inputs and the data lines as outputs. The next device developed for implementing logic circuits was PLA. PLA contains programmable AND-plane and OR-plane. In PLAs any of its inputs can be AND’ed together and can thus correspond to any product term of the inputs and any AND-plane outputs can be configured as sum of ANDplane outputs. Therefore PLAs are suitable for SOP form of logic functions. Because of low speed and high cost of production (PLA needs two configuration level).

Programmable Array Logic (PAL) devices developed to overcome these weaknesses. (Fig. 1)

(SoCs) implementation. Also soft processor models like Leon4 (http://www.gaisler.com/index.php/products/proc_ essors/leon4) and OpenRISC (http://opencores.org/or_ 1k/OR1200_OpenRISC_Processor) are used for the development of embedded products and also for academic research [2, 5, 6]. The FPGA functionality is based on configuration of the FPGA that specifying which logic blocks are used and which wire segments are used to connect them and determine what functionality each block provides. There is several methods for FPGA programming like antifuses, SRAM and EEPROM/FLASH which SRAM-based FPGAs are the most popular.

Fig. 1: A figure fitted in a column

2.2 FPGA Architecture A FPGA consists of two main parts: 1- Configurable Logic resources The first part is configurable Logic resources that called Configurable Logic Blocks (CLBs). (Fig. 2)

PALs need only a single level of programmability, because the programmable AND-plane is connected to the fixed OR-plane. Several types of PLAs with different number of inputs and outputs and different size of ORgates are produced. PAL devices are the basis for newer digital designs and had a very important effect on digital hardware design. In the evolution road to FPGA we can see Simple PLDs (SPLD) and Complex PLDs (CPLD). These types of programmable integrated circuits have low cost and very high pin-to-pin speed-performance. CPLDs are equivalent of about 50 typical SPLD devices logic capacity, but there is problem with higher densities. As the best solution to support very high logic capacity and have a fully programmable parts FPGAs was intruded [1]. 2.1 FPGA Briefly Field Programmable Gate Arrays (FPGA) are devices consisting of logic components, look-up tables (LUT), multiplexers (MUX) and flip-flops (FF) to implement simple and complicated functions [19]. Recent development in manufacturing technologies have increased the ability and performance of Field Programmable Gate Arrays (FPGAs) and have made them a powerful solution for the implementation of complex processor cores and systems-on-chip (SoCs) in a single FPGA device. To be more flexible and easier to implementation of such SoCs, FPGA vendors provide hard/soft processor cores. Hard processor cores are embedded in the FPGA die, for example, Xilinx integrates PowerPC 440 in the Virtex-5 FX family. Soft processor cores are implemented using the FPGA programmable logic. Microblaze from Xilinx and Nios processors from Altera are implemented using the programmable resources and can be embedded in any device family to reduce the cost of Systems-on-Chip

Fig. 2: ISLAND-STYLE ARCHITECTURE

Fig. 3: Generic SRAM FPGA Architecture [20]

CLBs are composed of Lookup Tables (LUTs) and Flip-Flops. Each LUT can implement a logical function. 2- Configurable routing resources Configurable routing resources consists of three parts: the wires, the switch blocks and the connection blocks. In some references CLB is called PLB (Programmable Logic Block) (Fig. 3). The logic blocks, wires, switch blocks and connection blocks constitute an architecture called island-style that shown in Fig. 2. This architecture is used new defect-tolerant architecture. Wires are in routing channels and indexed by track numbers. A channel (Fig. 2) is bounded by the CLBs. The wire's track number is based on its position relative to the width of the channel. 3.

Fault and Fault Detection

Because of the significant benefits of circuit scaling and capacity and performance, VLSI companies try to use the next-generation silicon nanometer technologies. This approach is moving to smaller transistor sizes but it has affected the reliability and raising several issues spatially in critical applications [2]. Fault tolerance (FT), as defined by this article, is the ability of a system to operate normally given the presence of malfunctioning resources, or faults. 3.1 Fault in FPGAs Manufacturing process for new FPGAs (like Xilinx Virtex-7 and Altera Stratix V) uses latest semiconductor technologies. Smaller transistors are increasingly prone to various defects [7]. SRAM-based FPGAs (FPGAs which use SRAM memory to store the configuration data), as the most popular type of FPGAs, offer high capacity and configurability, but this devices are involved two main type of defects. The first type is because of using the latest VLSI technology with the smaller size, the second is for the architecture of SRAM-based FPGAs. SRAMbased FPGAs is prone to Single Event Upsets (SEUs) [16]. SEUs are caused by high energy particles such as heavy ions and protons that may hit the sensitive silicon areas. This effect, may change the value of memory elements and change their logic state. Since the configuration of the SRAM-based FPGA is based on the stored data in memory, changing in memory element’s content, may strongly change the correct functionality of FPGAs. This unexpected output results, usually called Single Event Functional Interrupts (SEFIs). [17, 2] The SRAM-based FPGAs as a VLSI product, are composed of CMOS transistors, therefore the defects which are implicated for CMOS circuits, can be considered for this product too. For example in [23] shown that the hot-carrier effect, causes a gradual reduction in channel mobility and increase in threshold voltage in CMOS transistors, therefor in the circuit, switching speeds become slower and then leading to

delay faults. In [24] the performance degradation of FPGAs over time caused by Hot-Carrier Effects (HCE) and Negative Bias Temperature Instability (NBTI) is investigated. Because of the continuing device scaling, the lifetime reliability of the chips is decreasing [24, 25]. In [24] the impact of two different types of hard errors, TimeDependent Dielectric Breakdown (TDDB) and Electro migration (EM), on FPGAs has been analyzed and have demonstrated their significant impact to processor lifetime. EM is one of the causes of permanent failures. EM is a mechanism by which metal ions migrate over time leading to voids and deposits in interconnects. Finally these can cause faults due to the creation of open and short circuits. The effect of EM and other degradation mechanisms, is gotten worse by increasing current densities and smaller feature sizes of the interconnect wires. The fault classifications of SRAM-based FPGAs are interested in this paper. Some classifications of defects are reported in previous works. The report’s categorizes are based on some attributes of the defects or of the defect’s sources. In [2, 10, 20, 21, 24] some of the important and common defects of SRAM-based FPGAs are discussed. Some classifications are presented. The classification in [21] is near to our categorize. In [21] faults are classified into three classes: 1- Transient failures Some transient phenomena such as voltage level oscillation and partial radiation, may change the memory content of FPGA. The effects are transient because the system can recover in a short time. 2- Pseudo-permanent failures The source of these defects is same as previous class. The effect of the fault, for example the effect of a SEU, may change the LUT’s data. The contents of the LUTs are the implemented functions, therefor this changes may cause error in output. This defects are like permanent failures, but can be recovered by system reloading. 3- Permanent failures This class are very serious. The permanent faults may cause by physical damages. Some of the permanent failures are the manufacturing defects such as Stuck-at or bridging faults, also electro migration and etc. can destroy some resources of FPGAs. The system after a permanent error can be recovered by loading the configuration into the spare spaces. In this paper the proposed classification, is presented in TABLE I. In this table the defects are divided into permanent and transient. Each of them may occurred in soft form (data or configuration bits) or in hard form (LUTs, logic gates or other FPGA resources).

Transient

TABLE I: Classification of faults in SRAM-based FPGA

Soft errors (change configuration bits or Data bits )

Cause

Example

Single Event Upsets (SEUs) Single Event Transients (SETs)

Certain types of radiation.

Flipping data in user logic. (not in configuration data)

Single (SEUs)

Certain types of radiation.

Flipping data in configuration bits.

Hard errors Soft errors configuration Data bits ) Hard errors

Permanent

Faults/degradation

(change bits or

Event

Upsets

Single Event Transients (SETs) Manufacturing defects delay faults

accelerated aging

The faults at a time, may occurred as single fault or multi faults, also they may happened in running time or when the system is offline. 3.2 Fault Detecting and Locating in FPGAs Fault handling needs fault detection. In fault detection process. The main process will be noticed about a critical situation, and then the faulty components will be detected (Locating). After the fault detecting and locating, it can be recovered. Fault detecting and locating operations, can be done concurrently or in a multi-stage process. [10] All approaches to fault detection and recovery, are mainly based on redundancy. Redundant schemes usually can achieve concurrent fault detection and prevent the propagation of errors, also this redundancy-based fault detection schemes, guarantee the detection of both transient and permanent faults. In most fault mitigation approaches that proposed in previous works, we can see a combination of a redundancy-based fault detection solution such as TMR-based scheme and a process known as scrubbing, which by re-writing the configuration data can corrects the soft errors in the configuration memory [2]. In scrubbing approach unlike the Reconfiguration approach, only the faulty part of the configuration memory will be written (‘scrubbed’) and the system is fully operational [18]. The fault detection methods can be classified in three classes [10]: 1. Redundant/concurrent error detection using additional logic to detect incorrect output. Concurrent error detection (CED) schemes [21] are useful to decide whether the current configuration can

stuck-at, bridging Hot-Carrier Effects (HCE) Negative Bias Temperature Instability (NBTI) electro migration (EM) time-dependent dielectric breakdown (TDDB)

Small MTTF

make the system work, not only during normal operation but also after recovery (reconfiguration). 2. Off-line test methods error detection when FPGA is not performing its operational function. 3. Roving test methods perform a progressive scan of the FPGA structure by swapping blocks of functionality with a block carrying out a test function. There is some other approaches such as online Test Method. In Online Test Methods, external equipment for test pattern generation or output response analysis is not available. Test pattern generation and output response analysis is based on built-in self-test (BIST) [8]. The different approaches to fault detection are evaluated against a set of metrics (Speed of detection, Resource overhead, Performance overhead, Granularity, Coverage) in [10]. The redundant approaches are fast to detect the fault but they need much resources. Off-line test methods need very small space but they slow to detect a faulty state. Roving methods uses medium size of space and have medium speed in detecting faults. 4.

Fault Recovery and Fault Tolerance

After the fault detecting and locating, the recovering process will start. There are many approaches to mitigate faults effects and recover the normal operation of the circuit. In this section we introduce the most common and most applicable methods which used in SRAM-based FPGAs to recover and make a system Fault Tolerant. Many approaches are proposed to recover a faulty system and increase the reliability on FPGAs. The goals

of fault-tolerance techniques are to minimize the hardware, timing, and power overhead, and maximize the reliability of the system. [1]

increase the average lifetime of the devices and demonstrated their effectiveness on different applications mapped onto FPGAs:

This paper emphasize that, some methods prevent a system to be faulty and some others recover a faulty system, but finally both of them, help the system to be Fault Tolerant.

1- Improving the Mean Time to Failure due to the Time-Dependent Dielectric Breakdown

In general there are two main class to fault tolerance a FPGA:

3- Selective Alternate Routing Technique (SART) to Increase the Mean Time to Failure due to Electro migration

1- Hardware Design 2- Software improvement In fact designing fault tolerant systems are applicable either in hardware, or software level. In Hardware Design two aspects are considered. First hardened the FPGA physically, for example using improved ceramics and advanced silicon types, second designing some methods to prevent wrong results, such as CRC circuits in hardware. Hardware-based fault tolerance has better performance than software-based implementations, but the fabrication will increased [3]. In Software improvement the emphasis is on designing methods. Some CAD Tools that support software-based approaches in designing fault tolerant systems, are developed such as Xilinx Triple Modular Redundancy (TMR) [14]. The TMR is a framework for providing fault masking at FPGA and uses hardware redundancy to mask any single failure voting on the result of three same copies of the circuit. TMR is very costly in chip area and power consumption [3]. To reduce these costs and improving TMR, a number of methodologies have been introduced [11, 12, 13]. For example in [13] a software tool is introduced which automatically classifies circuit structures and applies TMR selectively based on the classification of the circuit structure. These solutions introduced some methods to improving performance and power metrics of designs implemented at FPGAs, without affecting the fault masking. In [3] a new software-supported framework for fault masking against upsets occurred due to reliability degradation and aging phenomena, is introduced. [19] Try to join separated layers with their individual approaches to fault tolerance increases the overall radiation susceptibility to a maximum value and enables the use in high-energy physics particle accelerators. Some of the proposed methods are for increasing the FPGA lifetime. Based on [24] the MTTF of a metal line depends primarily on the current density and the length of the wire. Four design techniques, have been proposed to

2- Load Balancing to Mitigate the Hot-Carrier-Effect Impact

4- Flipping Configuration Bits to Reduce the Negative Bias Thermal Instability Impact Some of the main basic approaches in designing fault tolerant system, are [7]: •

Tile-based fault tolerance:

This approaches partition the design into tiles. The tiles are composed of physical resources. •

Column-based approaches:

When a fault was located and the module remapped, online routing is needed to re-establish the modules in new locations. • The Roving STARs based fault tolerance method [22]: Roving STARs provides testing for all resources. This is an integrated approach for on-line FPGA testing, diagnosis, and fault-tolerance. In this method spare resources are always present in the neighbourhood of the located fault and can replace faulty operational resources. In [1, 10] the fault tolerance and recovery methods are considered at a number of different levels: •

Hardware level repair

Performs a correction such that the FPGA remains unchanged for the purposes of the configuration. The device retains its original number and arrangement of useable logic clusters and interconnects. Column/row shifting is a hardware level method [1]. •

Configuration level repair

Is achieved using resources that are unused by the design. The spare resources can replace faulty ones in the event of a fault. Strategies can be classified into three subclasses [3]: •

Alternative Configurations:

One way of achieving fault tolerance is to pre-compile alterative configurations. The FPGA is split into several tiles, each has its own set of configurations which have a

common functionality and interface to the adjacent tiles [9, 10]. Fault tolerance is achieved by replacing a configuration tile with an alternative one in which the faulty resource is not used. This method requires a little run-time computation by considering the placement and routing is already done. •

Incremental Mapping, Placement and Routing:

To minimize the impact on timing and routing in the area around the fault, a method so called pebble shifting, cluster reconfiguration is used in combination with pebble shifting. Cluster reconfiguration is carried out in preference and faulty clusters can be reused by a different function, as the case if the fault will not be manifest. •

Evolutionary Algorithms:

Evolutionary algorithm approach allows a large degree of flexibility with the number and distribution of faults that can be tolerated, but the area and computational overhead required is very large. •

System level repair:

Works at a higher level. In a modular design by using of a spare functional block, a fault can be tolerated. Some other classifications of fault tolerance method are reported. In [20] the authors presented the fault tolerant methods for SRAM-based FPGAs. The methods are classified based on the provider of the methods: •

Manufacturer-provided methods:

Which require modifications to the current FPGA architectures that end-users cannot perform. •

User-provided methods:

Depend upon the end-user for implementation. These higher-level approaches use the configuration bitstream of the FPGA to integrate redundancy within a user’s application. At a glance, the classification in [20] can be drawn as fig. 4. To compare the FT methods there are some metrics. The main metrics are separated into two categories [20]: 1- Overhead-related Metrics 1-1-

Physical Resource Overhead

1-2-

Throughput Reduction

1-3-

Detection Latency

1-4-

Recovery Time

2- Sustainability Metrics 2-1-

Fault Exploitation

2-2-

Recovery Granularity

2-3-

Fault Capacity

2-4-

Fault Coverage

2-5-

Critical Components

The comparison of some fault tolerant methods, with respect to this metrics, are done in [1, 7, 20]. 5.

Conclusion

The SRAM-based FPGAs are very integrated and flexible devices. They have the ability to reconfiguration which provide Reliability and Availability in serious issues. The faults categories are based on some attributes of the defects or of the defect’s sources. The faults can have transient or permanent defects on the system. The effects are transient because the system can recover in a short time but after a permanent error the system can be recovered by loading the configuration into the spare spaces. There are some methods for recovering a faulty system or to fault tolerance a system. The main of the methods was survived and addressed in this paper. In fact the runtime configurability of SRAM-based FPGAs, provides interesting opportunities for both defect detection and fault tolerance. Some classifications of the methods was presented. A number of the methods are Hardware methods and embedded in FPGA in fabrication time, but some methods can be applied in designing time or when the system is online (at runtime). These approaches have some types of overhead. In this paper a number of main metrics to comparison the efficiency and overhead of each methods are introduced and some good referenced to the see the comparison results are referenced. All the reported works have some benefits and some disadvantages which the developer must considers in designing a SRAM-based FPGA system. Really in designing time a tradeoff between overheads must be considered. 6.

Discussion and future works

There is several methods and several papers for designing fault tolerant systems. Several types of fault be considered. In case of SEU the transient effects in FPGAs is addressed in many papers, but powerful radiations have some permanent effects that must be considered more. Also the evolutionary methods can be more useful, because of their ability in optimization, they can be useful to find the faulty components spatially when the number of faults is increasing. We hope to explore and use a new method that is a combination of Simulated Annealing (SA) method and a blind method for fault detection. In the proposed method we will divide the resources of FPGA into tiles with the maximum size (We uses the maximum size for tiles to decrease the number of tiles

and therefore to decrease the scan time). In runtime a process will scan all the tiles periodically. The time for interval is based on the average of MTTF for the test device. (We must to have the MTTF for each family of FPGAs). In this methods we have no overhead for fault detection and no redundancy will be imposed to design. In scanning process for a tile, we use spare tiles and we load the current tile into spare tile and then test the resources of tile concurrently. Because of the ability of FPGAs in concurrent operation, we can do scan in many tiles when we are displacing other tiles. When a faulty tile be found no action is needed for keep the system operational, because the safe copy of this tile is running.

Now we just need to find the faulty component in recent tile. By using of Simulated Annealing method which has a guaranteed result and good time to response, we can find the faulty component. In next step we mark the faulty blocks and create new tiles with new size. In next step the faulty blocks in any tiles will not be scanned. We think this method can be useful to find any type of permanent faults and can recover any type of transient faults (because we always use a fault free tile and continually the place of configuration data is changing).

Fig. 4: Classification of fault tolerant methods with respect to provider of the methods [20]

References [1] B. Harikrishna, S. Ravi, “A Survey on Fault Tolerance in FPGAs, ” 978-1-4673-4603-0/12/$31.00, IEEE, 2012. [2] L. Vavousis, A. Apostolakis, M. Psarakis, “A Fault Tolerant Approach for FPGA Embedded Processors Based on Runtime Partial Reconfiguration,” J Electron Test, Springer, New York , 2013. [3] K. Siozios, D. Soudris, “A low-cost fault tolerant solution targeting commercial FPGA devices, Journal of Systems Architecture,” Elsevier, 2013. [4] M. Psarakis, A. Apostolakis, “Fault Tolerant FPGA Processor Based on Runtime Reconfigurable Modules,” 17th IEEE European Test Symposium (ETS), 2012. [5] M. Shahbazi, P. Poure, S. Saadate, M. Reza Zolghadri, “FPGABased Reconfigurable Control for Fault-Tolerant Back-to-Back Converter Without Redundancy,” TRANSACTIONS ON INDUSTRIAL ELECTRONICS, IEEE, 2013. [6] P. Arun Kumar, P. Pandian, R.P.Perinbam, “Hardware Reusable Fpga Design,” Middle-East Journal of Scientific Research, 2013. [7] H. Zhang, L. Bauer, M. A. Kochte, E. Schneider, C. Braun, M. E. Imhof, H-J. Wunderlich, J. Henkel, “Module Diversification: Fault

[8]

[9]

[10]

[11]

[12]

[13]

Tolerance and Aging Mitigation for Runtime Reconfigurable Architectures,” INTERNATIONAL TEST CONFERENCE, IEEE, 2013. M. B. Tahoori, E. J. McCluskey, M. Renovell, P. Faure, “A MultiConfiguration Strategy for an Application Dependent Testing of FPGAs,” 22nd IEEE VLSI Test Symposium, IEEE, 2004. J. M. Emmert, C. E. Stroud, M. Abramovici, “Online Fault Tolerance for FPGA Logic Blocks,” IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, IEEE, 2007. A. Miele, P. D. Torino, “A software framework for dynamic selfrepair in embedded SoCs exploiting reconfigurable devices, Automation Quality and Testing Robotics (AQTR),” IEEE International Conference on, IEEE, 2010. O. Ruano, J. A. Maestro, P. Reviriego, “A Methodology for Automatic Insertion of Selective TMR in Digital Circuits Affected by SEUs,” Nuclear Science, IEEE Transactions on , vol.56, no.4, pp.2091,2102, Aug. 2009 K. Siozios, D. Soudris, “A Methodology for Alleviating the Performance Degradation of TMR Solutions,” Embedded Systems Letters, IEEE , vol.2, no.4, pp.111,114, Dec. 2010. B. Pratt, M. Caffrey, P. Graham, K. Morgan, M. Wirthlin, “Improving FPGA Design Robustness with Partial TMR,”

[14] [15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

Reliability Physics Symposium Proceedings, 2006. 44th Annual., IEEE International , vol., no., pp.226,232, 26-30 March 2006 Xilinx, TMRTool User Guide Xilinx User Guide UG 156, 2004. T. Nandha Kumar, Haider A. F. Almurib, New Chin-Ee, “Fine grain faults diagnosis of FPGA interconnect,” Microprocessors and Microsystems, Elsevier, 2012. F. G. D. L. Kastensmidt, G. Neuberger, R. F. Hentschke, L. Carro , R. Reis, “Designing fault-tolerant techniques for SRAM-based FPGAs,” IEEE Des Test Comput 21(6):552–562,2004. M. Bellato, P. Bernardi, D. Bortolato, A. Candelori, M. Ceschia, A. Paccagnella, M. Rebaudengo, M. S. Reorda, M. Violante, P. Zambolin, “Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA,” IEEE/ACM Des Autom Test Eur (DATE), pp. 188–193, 2004. M. Berg, C. Poivey, D. Petrick, D. Espinosa, Austin Lesea, K. LaBel, M. Friendlich, H. Kim, A. Phan, “Effectiveness of Internal vs. External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and Analysis,” 9th European Conference Radiation and Its Effects on Components and Systems (RADECS07), IEEE, 2008. J. Gebelein, H. Engel, U. Kebschull, “AN APPROACH TO SYSTEM-WIDE FAULT TOLERANCE FOR FPGAS”, IEEE, 2009. M. G.PARRIS, C. A. SHARMA,R. F. DEMARA, “Progress in Autonomous Fault Recovery of Field Programmable Gate Arrays,” ACM Computing Surveys, Vol. V, No. N, Article A,2010. L. Shang, M. Zhou,Y. Hu, “A Fault-Tolerant System-onProgrammable-Chip Based on Domain-Partition and Blind Reconfiguration,” NASA/ESA Conference on Adaptive Hardware and SYSTEMS, IEEE, 2010. M. Abramovici, J. M. Emmert, C. E. Stroud, “Roving STARs: an integrated approach to on-line testing, diagnosis, and fault tolerance for FPGAs in adaptive computing systems,” Evolvable Hardware, 2001. Proceedings. The Third NASA/DoD Workshop on, vol., no., pp.73,92, IEEE, 2001. C. Guérin, V. Huard, A. Bravaix, “The Energy-Driven Hot-Carrier Degradation Modes of nMOSFETs,” IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, VOL. 7, NO. 2, IEEE, JUNE 2007. S. Srinivasan, R. Krishnan, P. Mangalagiri, Y. Xie, V. Narayanan, M. J. Irwin, K. Sarpatwari, “Toward Increasing FPGA Lifetime,” IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, IEEE, APRIL-JUNE 2008. V. Betz, S. Brown, “FPGA Challenges and Opportunities at 40 nm and Beyond,” Altera Corporation, IEEE, 2009.