Investigation of Transient Effects on FPGA-based Embedded Systems Ali Bakhoda, Seyed Ghassem Miremadi, Hamid R. Zarandi Dependable Systems Laboratory, Department of Computer Engineering Sharif University of Technology Azadi Ave.,Tehran,Iran
[email protected],
[email protected],
[email protected] Abstract In this paper, we present an experimental evaluation of transient effects on an embedded system which uses SRAM-based FPGAs. A total of 7500 transient faults were injected into the target FPGA using Power Supply Disturbances (PSD) and a simple 8-bit microprocessor was implemented on the FPGA as the testbench. The results show that nearly 64 percent of faults cause system failures and about 63 percent of the faults lead to corruption of the configuration data of the FPGA chip.
1. Introduction Nowadays, embedded systems are extensively used in space and industrial applications [21], [12]. Examples of these systems are aircraft flight control systems, patient monitoring systems found in hospitals and instrumentation and control systems found in nuclear power plants. Due to advances in silicon technology, the embedded systems have been integrated for not only digital hardware but also embedded software on a single programmable logic device such as system-on-chip (SOC) [27], and field programmable gate array (FPGA) [28]. FPGA designs, with regard to custom ASIC designs are good solutions to implement embedded systems [22]: 1) high gate density, that leads to implementing large designs with small size and low power, 2) no non-recurring engineering (NRE) cost, which is not found in ASIC designs, 3) reduced time of manufacturing designs and, therefore, fast time-to-market, 4) low hardware cost, and 5) reconfigurability and remote programming, which are valuable for remote missions. However, the importance of dependability issues limits their widespread adoption in safety-critical applications [6], [10], [25]. On approach to solve the problem of faults, is to include fault-tolerant mechanisms such as triple modular redundancy (TMR) scheme into designs implemented by FPGAs [13]. However, this solution
enforces high area overhead, three times more input and output pins and high performance penalties [22]. Moreover, it may not be affordable to use redundancy in every module (or component) specially in embedded systems [8] where power and area are important constraints. One way to study the effect of faults on these programmable devices, to achieve a fault-tolerant embedded system, is to use fault injection [5]. Fault injection into the emulated models of a design using FPGA chips is presented in [4], [15], [16], [17]. They have used FPGA to inject stuck-at faults for test pattern generation purposes [18]. However, these works do no address the evaluation of fault-tolerant systems. In [9], [11], [25] some attempts have been made to exploit FPGA-based evaluation of fault tolerant systems. These methods are based on using scan-chain hardware to inject faults to the target systems. They intended to use FPGAs to speed up the fault injection experiments. Fault injection evaluation of SRAM-based FPGAs using radiation method has been presented in [11], [20], [26]. They used radiation experiments to test designs implemented on FPGAs. Fault injection into configuration memory of FPGAs has been introduced in [1], [6], [7], [10], [14], [29]. They have considered transient faults, which might occur in configuration memory of device. This paper presents the results of an experimental fault injection, i.e. Power Supply Disturbances(PSD) on a typical SRAM-based FPGA, in this case an Altera Flex10k. It should be noted that PSD might affect other cores operating in the system in addition to SRAMbased FPGAs but in this paper the focus is on SRAMbased FPGAs. A total of 7500 transient faults, 2500 for each workload, were injected into the target FPGA. The results show that between 63 and 65 percent of injected faults caused system failure, and between 62 and 64 of the injected faults lead to corruption of the configuration data of the device. The rest of the paper is organized as follows. Section 2 describes an overview of fault injection environment used in our experimental evaluation. Section 3 describes the fault injection experiments and
Proceedings of the Second International Conference on Embedded Software and Systems (ICESS’05) 0-7695-2512-1/05 $20.00 © 2005
IEEE
reports the experimental results obtained from the fault injection environment. A brief description of FLEX10K device architecture is also presented in Section 3 in order to give a better comprehension of the fault injection process and the error analysis. Finally, the conclusions of this study are given in Section 4.
2. Overview of fault injection environment In order to evaluate the effects of PSD faults on the target system implemented on the FPGA, an experimental environment was developed. It was also used to accelerate and automate fault injection campaign.
2.1. Architecture As shown in Figure 1, the experimental system consists of four main parts: 1) an FPGA board, 2) a logic analyzer, 3) a PSD fault injection board, and 4) a host computer.
Figure 1. Overview of fault injection environment FPGA Board: The board is equipped with an Altera Flex10K50RC240 subjected to fault injection. It works with a 10MHz clock and its power supply is provided through PSD injection board. To configure the FPGA with a functional circuit, the HDL model of a simple 8-bit processor, namely Motorola 6809 was selected as the testbench. The FPGA board has been equipped with several parallel ports connected to the logic analyzer important internal states and observation points should be transferred to the FPGA pins so the logic analyzer can observe them. Logic Analyzer: The HP 16702A Logic Analysis System is used in the fault injection campaigns. A unit
called Command Dispatcher is utilized to issue fault activation/deactivation commands to the fault injection board. The communication between the host computer and the logic analyzer is established by two network adapter and TCP/IP protocol which are set-up in both of them. Another unit called Pin-level data collector monitors and collects the data on the FPGA pins including the clock and the power supply lines during the fault injection experiments. It saves the collected data on a local disk and then invokes the experiment manager process on the host computer to transfer this file to the computer. The Host computer: Fault injections are managed by a software called experiment manager which is executed in the host computer. This software generates the fault activation time as well as the fault duration time and sends these times along with a start signal to the logic analyzer to invoke the Command Dispatcher process. The fault injection procedure is shown in Figure 2. To analyze and evaluate the results obtained from the fault injection experiments, the experiment results of a fault-free circuit are also required. Hence, the procedure shown in Figure 2 should be run once without any fault injection to obtain a fault-free circuit behavior (golden run). In addition to running the experiment manager software the host computer performs two other tasks: 1) It configures the FPGA chip using its parallel port during each round of fault injection, and 2) analyzes the experimental results.
PSD fault injector: The fault injection board, which is implemented by a PIC 18F252 microcontroller, injects the PSD faults to the power supply pins of the FPGA chip. PSD injection board is configured to receive its commands from the logic analyzer. The PSDs are injected by turning off a MOS transistor placed between VCC and the FPGA’s power supply lines and turning on an NPN power transistor placed between the ground and the power-supply lines of the FPGA. So the power lines are disconnected from PSD Fault Injection Procedure (fault list FList, testBench B ) { For each fault injection time and duration (t, d) in FList { Reset the FPGA board; Reconfigure the FPGA based on the testBench B; Run implemented circuit up to time t; Activate the PSD fault injector Alter the voltage of FPGA power supply lines; Run the implemented circuit for the duration of d time units; Deactivate the PSD fault injector Return the voltage of FPGA power supply lines to normal; Run the implemented circuit to the end time; Store the circuit internal states and results into trace file; }}
Figure 2. The fault injection procedure
Proceedings of the Second International Conference on Embedded Software and Systems (ICESS’05) 0-7695-2512-1/05 $20.00 © 2005
IEEE
the VCC and the FPGA’s capacitance is discharged through the NPN transistor. As a result the PSD fault injection occurs with a voltage drop in the power supply for a few microseconds. In our experiments, no effects were observed on the FPGA as long as the voltage level of the FPGA dropped to 2.7V. However, when the voltage level of the FPGA dropped under 2.7V, the FPGA did not work properly anymore. It should be noted that if the voltage was dropped to 2.4V, the crash probability increased, and at 2.1V the FPGA’s configuration memory was completely lost. In these experiments, faults that dropped the supply voltage to voltages between 2.1V and 2.7V were used for fault injection.
2.2. Data analysis and effect classification As mentioned before the comparator and data analysis unit performs the task of extracting the experimental results from the saved signal traces. This unit compares signal traces of each faulty circuit with the signal traces of the fault-free circuit and extracts the error manifestation signals if their values in the faulty model are different from the values in the faultfree model. The unit reports the effect of faults according to the following categories [25]: • Effective error: the results produced by the faulty circuit are different from those produced by the fault-free one. • Non-Effective error: it includes two types: o Effect-less: the results produced by the faulty circuits are equal to those produced by the faultfree one. o Latent: the results produced by the faulty circuits are equal to those produced by the faultfree ones, but at the end of experiment campaign, the contents of system state differ from that of the fault free one.
3. Experimental study As mentioned before the adopted FPGA was an Altera FLEX10K50 FPGA and the HDL model of a simple 6809 processor was selected as the testbench to configure the FPGA with a functional circuit. The processor, which was described in VHDL, was synthesized using Quartus II development tool [2]. Place and route operations were also performed using the same tool with the targeted device of FLEX10K50
and the adopted package of 240 pin PQFP. The host computer was a Pentium IV system with 512 MB RAM and Windows XP Professional OS.
3.1. Exploited FPGA: FLEX10K50 The FPGA we exploited in the environment was Altera FLEX10K50RC240-4 device. It contains 2880 logic elements (LEs) and 189 IO pins [3] (see Table 1). We used this programmable device because of the following reasons: • It has the same architecture appeared in other FLEX devices of the Altera FPGAs. • It is achievable to implement a testbench on this FPGA that consumes significant resources. • It is an SRAM-based FPGA so it is possible to reconfigure it as many times as required for fault injection experiments. • Using this SRAM-based FPGA, another result can be extracted; the effect of PSD fault injection on the configuration memory can be analyzed. Internal block diagram and logic element structure of the FLEX10K50 are shown in figures 3 and 4, respectively. FLEX10K family is a high speed SRAMbased FPGA. Based on reconfigurable CMOS SRAM elements, the Flexible Logic Element MatriX (FLEX) architecture incorporates all features necessary to implement common gate array megafunctions [3]. Flex FPGAs are SRAM-based, i.e. the configuration information is stored in internal memory cells (SRAMs). It supports a wide range of configurable gates from 10K to more than 200K and it is indeed the industry’s first embedded programmable logic device (PLD) providing system-on-a-programmable-chip integration. Depending on the design, up to 200 MHz performance can be accomplished [3]. FLEX10K consists of a two-dimensional array of Logic Array Blocks (LABs) and Embedded Array Blocks (EABs) surrounded by Input/Output Elements (IOE) [3]. EABs are used for implementing megafunctions, such as efficient memory and specialized logic functions and complex logic functions such as digital signal processing, wide-data-path manipulation and datatransformation functions. On the other hand, LABs are utilized for implementation of general logic functions such as counters, adders, state machines, and multiplexers. The combination of embedded and logic
Table 1. FLEX10K50 Device Features FLEX10K50
Typical gates 50,000
Maximum system gates 116,000
Logic elements 2,880
Logic array blocks 360
Embedded array blocks 10
Proceedings of the Second International Conference on Embedded Software and Systems (ICESS’05) 0-7695-2512-1/05 $20.00 © 2005
IEEE
Total RAM bits 20,480
Maximum User I/O pins 310
Figure 4. FLEX10K logic element
Figure 3. FLEX10K block diagram arrays provides the high performance and high density of embedded gate arrays, enabling designers to implement an entire system on a single device [3]. All LABs and EABs are interconnected together by a high flexible network of routing resources. The routing resources are categorized as Row Interconnects (RI) and Column Interconnects (CI). FLEX10K family is supported by the Quartus II and MAX+PLUS II development systems, which are integrated packages that offer HDL, schematic design entry, compilation and logic synthesis, full simulation, worst-case timing analysis, and device configuration.
3.2 Implemented testbench The simple Motorola 6809 8-bit processor was selected due to several reasons: 1) the implementation can be carried out fast; 2) the results are easy to evaluate and analyze; 3) it has no built-in error detector - it would be necessary for evaluating the fault behavior study; 4) it was a typical processor used in implementing embedded systems [24]. Table 2 shows the resources consumed in the FLEX10K50. As portrayed in this table, the considered testbench occupies almost 92 % of the whole FPGA. Table 2. Testbench resource consumption in FPGA VHDL lines
FFs
IOEs
LEs
5244
725(25%)
54(28%)
2,666(92%)
3.3. Workloads The 6809 processor, configured in the FPGA, runs three different programs written in assembly language
as the workload programs: 1) QS: the recursive quicksort program, 2) MM: the matrix multiplication program, and 3) LL: a linked-list insertion program. QS sorted an array of pointers to 150 records according to the ascending order of key values in data records. The matrix multiplication workload multiplied 8 by 8 matrixes. This workload is a typical program that uses ALU resources of CPU. The third workload program was the linked-list program, which inserted 110 values of 8-bit integers into a sorted linked list. Initially, the linked list was empty. The operations of these workloads are aimed to represent the various types of operations in real application programs [23]. Table 3 shows this workload program’s properties. In all of the above workloads, fault injection was performed in random times during the workload execution time. Table 3. Workload program’s properties Workload QS MM LL
Assembly Instructions 298 345 513
IEEE
Clock Cycles 144453 127474 225507
3.4. Fault injection results For each of the mentioned workloads, a total of 2500 PSD faults were injected and system behavior was observed, all observation points and circuit I/O pins were collected by the logic analyzer and the results were analyzed and compared with the golden run. Table 4 shows the effects of PSD fault injection into the 6809 processor implemented on the mentioned FPGA. Three different behaviors of this system are presented in this table. As shown in this table,
Proceedings of the Second International Conference on Embedded Software and Systems (ICESS’05) 0-7695-2512-1/05 $20.00 © 2005
Memory Usage 363 401 549
Table 4. Effects of PSD faults Workload Quick sort Matrix multiplication Linked list program Average
Injected faults 2500 2500 2500
Effect-less errors (correct answers) 856(34.24%) 769(30.76%) 823(32.92%) 816(32.64%)
Latent errors (correct answers) 67(2.68%) 90(3.60%) 67(2.68%) 7(2.99%)
Effective errors (wrong answers) 1577(63.08%) 1641(65.64%) 1610(64.40%) 1609(64.37%)
Table 5. Effect of PSD faults on configuration memory of FLEX10K50 Workload Quick sort Matrix multiplication Linked list program Average
Injected faults 2500 2500 2500
Configuration Completely Lost 422(16.88%) 479(19.16%) 460(18.40%) 453(18.15%)
approximately 64 percent of faults lead to effective errors, 3 percent lead to latent errors and about 33 percent of faults were effectless. The effects of PSD faults on the configuration memory of FPGA are shown in Table 5. As shown in this table, 63 percent of faults injected into the FPGA caused errors in the FPGA configuration bits. They cause either complete loss of configuration data (about 18 percent) or partial loss of configuration data (about 45 percent). It is observed that a considerable percent of wrong answers were caused due to corrupted configuration memories. This happens because of the type of the FPGA used in the experiment, which was a SRAM-based FLEX10K50. Since in SRAM-based FPGAs, configuration of the target system is maintained in SRAM memory cells, configuration corruption may occur. Further, since the results of experiments on workloads have few differences it seems that the FPGA behavior, in presence of PSD faults, is relatively independent of the workloads.
Configuration Partially Corrupted 1152(46.08%) 1129(45.16%) 1099(43.96%) 1126(45.07%)
5. References [1]
[2] [3]
[4]
[5]
[6]
[7]
[8]
[9]
M. Alderighi, S. D’Angelo, M. Mancini, G. R. Sechi, “A Fault Injection Tool for SRAM-based FPGAs,” Proc. of IEEE International On-Line Testing Symp., 2003, p.129. Altera Corp., http://www.altera.com. Altera FLEX10K FPGA Datasheet, “FLEX10K Embedded Programming Logic Familiy,” Altera co., 1999. L. Antoni, R. Leveugle, B. Feher, “Using Run-time Reconfiguration for Fault Injection in Hardware Prototypes,” Proc. of the IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems, USA, 2000, pp. 405-413. J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J. C. Fabre, J. C. laprie, E. Martins, D. Powel, “Fault Injection for Dependability Validation: A Methodology and Some Applications,” IEEE Trans. on Software Engineering, Vol. 16, No. 2, Feb. 1990, pp. 166-182. G. Asadi, S. G. Miremadi, H. R. Zarandi, A. Ejlali, “Fault Injection into SRAM-based FPGAs for the analysis of SEU Effects,” Proc. Field Programmable Technology, Japan, 2003, pp.428-430. G. Asadi, S. G. Miremadi, H. R. Zarandi, A. Ejlali, “Evaluation of Fault-Tolerant Designs Implemented on SRAM-based FPGAs,” Proc. Pacific Rim Dependable Computing, Polynesia, French, 2004, pp.327-332. G. Asadi, M.B. Tahoori, “An Analytical Approach for Soft Error Rate Estimation of SRAM-based FPGAs”, In MAPLD International Conference, September 2004. L. Berrojo, I. Gonzalez, F. Corno, M.S. Reorda, G. Squillero, L. Entrena, C. Lopez, "New Techniques for Speeding-up Fault-Injection Campaigns Design," in Proc. Automation and Test in Europe Conf. and Exhibition, 2002, pp. 847-852.
Proceedings of the Second International Conference on Embedded Software and Systems (ICESS’05) 0-7695-2512-1/05 $20.00 © 2005
IEEE
926(37.04%) 892(35.68%) 941(37.64%) 919(36.79%)
where near to 63 percent of the injected faults were because of configuration memory corruption/loss of the considered FPGA chip.
4. Conclusions In this paper, we presented the experimental dependability evaluation of an embedded system which uses an Altera FLEX10K50 FPGA. The fault injection was performed using power supply disturbance and the system behavior was observed and analyzed. The testbench circuit implemented on it was a simple 8-bit 6809 processor and three workloads namely quick-sort, matrix multiplication and linked-list insertion programs, were executed on the processor. A total of 7500 transient faults were injected into the target FPGA and the experimental results were gathered and averaged. The results showed that in average near to 64 percent of injected PSD faults caused system failure
Correct Configuration
[10] P. Bernardi, M. Sonza Reorda, L. Sterpone, M. Violante, “On the Evaluation of SEU Sensitiveness in SRAM-based FPGAs,” Proc. 10th IEEE International On-Line Testing Symposium, Portugal 2004, pp. 115120. [11] D. Brodrick, A. Dawood, N. Bergmann, M. Wark, ”Error Detection for Adaptive Computing Architectures in Spacecraft Applications,” 6th Australasian Computer Architecture Conference, 2001, pp. 19-26. [12] C. Carmichael, E. Fuller, P. Blain, M.Caffrey, “SEU Mitigation Techniques for Virtex FPGAs in Space Applications,” Proc. of the Military and Aerospace Applications of Programmable Logic Devices,1999. [13] C. Carmichael, “Triple Module Redundancy Design Techniques for Virtex FPGAs,” Xilinx Application Note XAPP197, November, 2001. [14] R. Ceschia, M. Violante, M. Sonza Reorda, A. Paccagnella, P. Bernardi, M. Rebadengo, D. Bortolato, M. Bellato, P. Zambolin, A. Candelori, “Identification and Classification of Single-Event Upsets in the Configuration Memory of SRAM-based FPGAs,” IEEE Transactions on Nuclear Science, Vol. 50, No. 6, December 2003, pp. 2088-2094. [15] T.J. Chakraborty, C.H. Chiang, "A Novel Fault Injection Method for System Verification based on FPGA Boundary Scan Architecture," Proc. Int'l Test Conference, 2002, pp. 923 –929. [16] P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, M. Violante, "Exploiting Circuit Emulation for Fast Hardness Evaluation," IEEE Transactions on Nuclear Science, vol. 48, Dec. 2001, pp. 2210-2216. [17] P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, M. Violante, “Exploiting FPGA-based Techniques for Fault Injection Campaigns on VLSI Circuits”, Proc. 2001 IEEE Intl’ Symp. Defect and Fault Tolerance in VLSI Systems, 2001, pp. 250-258. [18] M. Gokhale, P. Graham, E. Johnson, N. Rollins, M. Wirthlin,“Dynamic Reconfiguration for Management of Radiation Induced Faults in FPGAs,” Proc. of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, New Mexico, April 2004, pp. 145-150.
[19] P. Folkesson, S. Svensson, J. Karlsson, "A Comparison of Simulation Based and Scan Chain Implemented Fault Injection," Proc. 28th Int. Symp. on FaultTolerant Computing (FTCS-28), pp. 284-293, (Munich, Germany) June 1998. [20] E. Fuller, M. Caffrey, P. Blain, C. Carmichael, N. Khalsa, A. Salazar, “Radiation test results of the Virtex FPGA and ZBT SRAM for Space Based Reconfigurable Computing,” MAPLD Int. conference, 1999. [21] L. Kaufman, J. B. Dugan, R. Manian, “System Reliability Analysis of an Embedded Hardware/Software System using Fault Trees,” Proc. Ann. Reliability & Maintainability Symp., Jan. 1999, pp. 135-141. [22] F. Lima, L. Carro, R. Reis, “Designing Fault Tolerant Systems into SRAM-based FPGAs,” Design Automation Conference, 2003, pp. 650-655. [23] S.G. Miremadi, J. Torin “Evaluating Processor-Behavior and Three Error Detection Mechanisms Using Physical Fault-Injection,” IEEE Transaction on Reliability, Vol. 44, No. 8, pp.441-454, September 1995. [24] M. Pflanz, HT Vierhaus, “Generating Reliable Embedded Processors,” IEEE Micro, Vol. 18, No. 5, September 1998, pp.33-41. [25] M. Rebadengo, M. Sonza Reorda, M. Violante, B. Nicolescu, R. Velazo, “Coping With SEUs/SETs in Microprocessors by Means of Low-Cost Solutions: A Comparison Study,” IEEE Transactions on Nuclear Science, Vol. 49, No. 3, June 2002, pp. 1491-1495. [26] J.J. Wang, R. B. Katz, J. S. Sun, B. E. Conquist, T. M. Speers, W. C. Plants, “SRAM Based Re-programmable FPGA for Space Applications,” IEEE Transactions on Nuclear Science, Vol. 46, No. 6, Dec. 1999, pp. 17281735. [27] Xilinx Inc., http://www.xilinx.com [28] Xilinx, “Virtex 2.5V Field Programmable Gate Arrays,” Data Sheet DS003-1, Xilinx, San Jose, CA, April 2001. [29] C.C. Yui., G.M. Swift, C. Carmichael, R. Koga, J. S. George, “SEU Mitigation Testing of Xilinx Vitrtex II FPGAs,” IEEE Radiation Effects Data Workshop Record, July 2003, pp. 92-97.
Proceedings of the Second International Conference on Embedded Software and Systems (ICESS’05) 0-7695-2512-1/05 $20.00 © 2005
IEEE