Tools and Methodology Development for Pulsed Laser Fault Injection ...

26 downloads 13764 Views 184KB Size Report
Physical fault injection campaigns are ..... scanned during this particular campaign. .... Automation and Test in Europe Conference (DATE), February 16-.
TOOLS AND METHODOLOGY DEVELOPMENT FOR PULSED LASER FAULT INJECTION IN SRAM-BASED FPGAS V. Pouget, A. Douin, D. Lewis, P. Fouillat IXL, Université Bordeaux 1, 33405 Talence, France

G. Foucard, P. Peronnard, V. Maingot, J. B. Ferron, L. Anghel, R. Leveugle, R. Velazco TIMA, 46 av Félix Viallet, 38031 Grenoble, France Abstract: This paper presents the development of a set of tools and the associated methodology for performing pulsed laser fault injection experiments in SRAM-based FPGAs. The new platform allows reliable evaluation of the impact of SEU and MBU in the configuration memory.

I. INTRODUCTION SRAM-based FPGAs are increasingly used in systems requiring a high level of safety and/or dependability due to the many advantages associated with their reconfigurability. The mission profiles of these systems may include harsh environments like being exposed to ionizing radiation. Single-event effects (SEE) induced by the interaction of heavy ions with integrated circuits are a well-known threat for space systems directly exposed to cosmic rays and solar flares but they are also a concern for applications in the atmosphere, at high altitude or at sea-level, for devices fabricated using modern process technologies, which are sensitive to the interaction of atmospheric neutrons. For applications using SRAMbased FPGAs at ground level, the most probable effect is the occurrence of single-event upsets (SEUs), i.e. bit flips, in the embedded or configuration memories [1]. Particle induced faults in the configuration memory directly affect the definition of the function implemented in the FPGA so they can have a tremendous impact on the ability of a system to operate properly [2-5]. Moreover, these errors are usually persistent as long as the configuration memory is not refreshed, and they can be difficult to detect since a systematic monitoring of the configuration would induce too much penalty in term of system availability. One of the most critical kinds of event is probably the occurrence of multiple bit upsets (MBUs) because the faults multiplicity and distribution are additional factors of complexity when trying to understand and mitigate the sensitivity of an architecture. Protection against SEU in configuration memory is to be taken care of by the designer. Design level solutions, like the triple modular redundancy (TMR) technique, have been developed in order to build faulttolerant architectures from SRAM-based FPGAs [6-7], but they always require finding a compromise between fault-

tolerance and resource overhead or performance penalty. At design time, the accurate evaluation of the impact of different kinds of faults on devices operations as well as the fast selection of the most efficient hardening strategies both necessitate realistic fault models. Indeed, SRAM-based FPGAs are composed of many different structures that all present different static and dynamic susceptibilities. Thus, predicting accurately the error rate in a complete design for a given application using a pure simulation-based approach is very difficult, if not impossible. Physical fault injection campaigns are necessary first for developing and calibrating fault models, then for practical testing of design options and for qualifying final fault-tolerant solutions. Heavy ions or neutron beam testing in particle accelerators is the preferred method for measuring the sensitivity of a system to SEU and MBU for reliable prediction of in-the-field error rates [8]. The pulsed laser testing method is a complementary approach that is more flexible for in-laboratory investigations during the phases of design, screening or analysis [9]. In this paper, we present an on-going collaborative effort to develop hardware and software tools and associated methodologies for performing pulsed laser fault injection in FPGA devices. We first present the experimental platform used in this work. The test set-up and methodology are then described and the first results obtained on the Xilinx Virtex II device are presented.

II. EXPERIMENTAL PLATFORM The development of a fault-tolerant architecture qualification platform presented in this work is based on the association of two existing state-of-the-art instruments. The first one is the electronic tester that drives and checks the status of the device under test (DUT). The second is the optical instrumentation system for pulsed laser fault injection.

A. THESIC+ testbed To perform fault injection, an upgraded version of the test platform presented in [8] was used. Figure 1

depicts the architecture of the developed system, so-called THESIC+ (Testbed for Harsh Environment Studies on Integrated Circuits) . This enhanced architecture is built around two FPGAs. The first one, called COM FPGA, contains a LEON2 processor. It handles the communication between the user’s computer and the resources available on the THESIC’s motherboard. It also monitors the DUT current in order to protect it against latchups. Data transfers are performed over the Ethernet network allowing a good data rate. The 2nd FPGA, called Chipset FPGA, contains the user design, mainly used to interface the DUT with the tester’s resources. For the experiment presented in this work, a state machine was implemented in the Chipset FPGA in order to control the configuration of the DUT, a Xilinx XC2V1000. The Chipset FPGA may also control the laser triggering. A photo depicted in figure 2 gives an overview of the test setup.

Fem tosecond Ti:sapphire Regenerative Amplifier

Optical Param etric Am plifier

Harm onic Generation

Picosecond Ti:sapphire Oscillator

Nd:YVO 4 Pum p 10W

Function generators Pattern generator

Pulse energy control

Pulse Picker

Power meter

Mechanical shutter

r

Electro-optic m odulator

i inputs

Spectrom eter Autocorrelator

Delay generator Power supplies

IR LED

Infrared microscope

Oscilloscope GPIB, RS232, Ethernet, USB, LPT

Figure 1: THESIC+ Block diagram

comprises a complete set of instrumentation for IC emulation and test. The so-called “PLS” optical set-up used for laser fault injection is presented in fig. 3. The laser source used in this work is a Ti:Sapphire oscillator delivering 1ps pulses at a repetition rate of 80MHz. This rate can be reduced down to single-shot using a pulsepicker. Additional mechanical and electro-optic modulators can be used for improving the optical pulse-tonoise ratio. The laser wavelength is tunable from 780nm to 1000nm. This allows adapting the beam penetration depth into silicon, which is particularly useful when testing flipchip devices through the backside of the substrate. Pulses are focused on the device under test (DUT) by microscope objectives giving adjustable spot sizes ranging from 1µm to 20µm. The available laser pulse energy on the DUT is adjustable up to typically 1nJ.

Lock-in amplifier

outputs

IR Vidicon

100x

Multimeter

DUT 4 axis controller

XYZ

100x

CCD

Test board SEEM

DAQ board

W hite Light

Visible microscope

Sync

Video board

Figure 3 : Photoelectric Laser Stimulation (PLS) set-up at the ATLAS laser facility

The laser spot is scanned over the area of interest by moving the DUT under the beam using 3D micropositioning stages with a resolution of 100nm. A computer controls the scanning operations synchronously with laser pulse triggering and data acquisition and visualization, using a dedicated software called SEEM [10]. The ATLAS facility is a research facility that also provides test services. The PLS set-up is commonly used for radiation effects testing, fault injection, and failure analysis in VLSI devices. Figure 2: experimental set-up

B. ATLAS laser testing facility ATLAS is the pulsed laser facility of the University of Bordeaux, dedicated to laser testing and analysis of integrated circuits [9]. It is composed of two ultrashort pulsed laser sources that supply several optical benches for different kinds of measurements. It also

III. METHODOLOGY A. Test details 1. Interfacing The first step of this work was the association of THESIC and ATLAS instruments to constitute a new testing platform with unique capabilities. The principle of the interfacing is presented in figure 4.

THESIC

ATLAS Laser beam

Optical set-up

SEEM software

Mechanical support

Initialize Read Restore Close

DUT board Motherboard

Driver DLL

Test software

Figure 4: principles of ATLAS-THESIC interface Mechanical interfacing consisted in mounting the THESIC tester board on ATLAS positioning stages using a custom support that holds the board under the microscope an enable an accurate adjustment of the orthogonality of the device with respect to the beam, while supporting the mechanical constraints induced by the cabling of the motherboard. No specific software interface was developed for the first results presented in this work but an optimal architecture was identified and is being implemented for future work. It is based on a specific module that defines the four elementary actions needed to perform a standard laser mapping. 2. Device under test The FPGA used in this experiment is a Xilinx Virtex-II XC2V1000 fabricated on a 0.15µm CMOS 8layer metal process. The DUT is presented in a 896-pin flip-chip fine-pitch package. Physically, there is 720 Kbit block RAMs distributed on four columns with multipliers, 432 available I/Os placed on all the surrounding of the chip and 4,082,592 configuration SRAM bits spread all over the chip. Each bit is placed close to the function it configures. The FPGA is configured by downloading the bitstream, the files containing all the configuration information. The main part of the FPGA is the CLB. A CLB is divided into 4 identical slices. Each slice contains both sequential (flip-flop) and combinational parts (LUT and multiplexers). Any of these can be bypassed in order to carry out the synthesised function. A first study on the bitstream showed that each CLB is configured thanks to 1760 SRAM bit and each IOB contains about 396 configuration bits. The device is encapsulated flip-chip so it has to be tested through the backside of the substrate. The die was thinned down to a residual thickness of approximately 60 µm in order to improve the optical transmission into the active layer. 3. Methodology The optical wavelength was adjusted to 950nm to

provide a sufficient penetration depth of the laser. Different areas of the device were scanned at a maximum speed of 200µm/s with a pulse repetition rate of 400Hz for most of the runs, with scan lines spaced by 5µm, using a spot size of 5µm. The equivalent pulse fluence on the chip is high, which increases the probability of generating events. The counterpart is that two consecutive laser pulses may impact the same node. This effect has not been quantified so far, but the cross section is assumed to be low enough so that this effect is negligible. Scans on large areas were repeated with increasing energies until the first events were observed. Several simple configurations of the device were tested in a static mode and we observed a significant impact of the configuration on the sensitivity of the device as will be presented in the last section. After each scan, a readback is performed and the resulting bitstream is compared to the golden one in order to detect errors. If needed, the device is then reconfigured before the next scan.

B. Backside navigation The test was performed through the backside without any visualization due to a power limitation of our infrared illumination LED. However, we were able to navigate in the device by analyzing the error signatures induced by the laser beam. Thus, it was possible to localize and target a given structure. This mode of navigation does have some limitations in terms of accuracy of positioning and reproducibility but it constitutes a simple and cheap way for testing this kind of device through the backside. The backside approach is particularly sensitive to the focalization of the beam in the active layer of the device. Indeed, small focus errors or thickness variations are multiplied by the index of the silicon (approximately 3.5 at 950nm) and may lead to a non-negligible distance between the focus plane of the beam (i.e. the beam-waist plane) and the active layers of the device. For these reasons, we selected a microscope objective with a reasonable numerical aperture (NA=0.40) in order to mitigate this issue. Although this leads to a spot size that is too big for accurate simulation of heavy-ions, it increases the efficiency of the campaign by making more probable the generation of errors. For more accurate testing, a higher magnification objective can be used in a second phase, once sensitive areas have been localized. The laser is first focused on the backside of the device. Scans are then performed while moving the device closer to the microscope objective until the maximum number of errors is observed. This allows determining the optimum focus depth, which is given in the first order by dividing the substrate thickness by the optical index of silicon.

After the reading of each erroneous bit-stream from the device, we compare it to the golden one. This step is made with Java programs partly based on the JBits API classes from Xilinx [11]. These programs process bitwise the reference bit-stream and the result of the readback to obtain the list of erroneous bits and their function in the configuration of the device. The erroneous bits can be sorted with respect to the type of configuration information (type of tile amongst CLB, GCLK, BRAM, IOB, interconnection or logic, …). The exact role of the erroneous bits can also be analysed; this part of the analysis is currently restricted to CLB tiles (including the related interconnections) but will be extended to the other types of tiles, in particular the BRAMs. The reports generated by these programs for each erroneous bit-stream are then post-processed to make statistics on the different values. The reports are first converted to CSV files (Comma Separated Values) by a Unix shell script (using Cygwin in a Windows environment). Then, statistics are computed in Microsoft Excel (or any spreadsheet software).

IV. RESULTS A. Error types After all the optical parameters were correctly adjusted, multiple errors were repetitively generated in the bit-stream. Depending on the laser energy and position, a single laser pulse could induce single or multiple bit-flips. Errors were observed in the CLB, IOB, and BRAM, and in their related interconnections. Table 1 summarizes the average error distribution amongst the different kinds of structures, for one campaign example. The number of erroneous bits obtained with a single laser shot can be very large, up to 4497 (137 on average, with a large variance that is also illustrated in Figure 5 by the very different repartitions for four different campaigns). It is obvious that such large values are a real challenge for mitigation techniques. Errors in CLB tiles and BRAMs are the most frequent. Looking more precisely inside the CLB tiles, it appears that the bits controlling the routing are the main contributors, as might Table 1: average distribution of the erroneous bits Element type

CLB

CLBE

GCLK

IOB

IOI

BRAM

BRAMI

Number

80.95

0

0

0.03

0.03

50.41

5.87

Average %

81.77

0

0

2.56

0.12

9.21

1.11

90 80 70 60 50 40 30 20 10 0

bits in CLB tiles

bits in other tiles Campaign 1 Campaign 2 Campaign 3 Campaign 4

CL CL B in B un te lo id rc gi en on t if ne c ied c tio CL in ns B C fr am LB a ee re xt a en sio ns G C LK IO in te IO rc on B ne BR ct io AM ns in BR te rc AM on ne ct io ns

C. Bit-stream analysis

Figure 5: variance of the repartition for four campaign examples be expected since they represent a large percentage of the configuration bits. However, a noticeable number of erroneous bits has also been observed in the logic configuration (including LUTs). Table 2 summarizes the average distribution with respect to the elements controlled by the erroneous bits within a CLB tile. The configuration data for a given CLB are organized within 22 different frames. The exact role of the last frame is currently not completely identified; this 22nd frame is therefore a particular case recorded separately in the classification of the erroneous bits. Table 2: average distribution of the erroneous bits within the CLB tiles Total

Logic

Interconnection

22nd frame

Number

80.95

34.49

44.15

2.31

Average %

81.77

17.19

63.49

1.11

Type of bit

B. Influence of laser energy We performed the same scan of a single line while increasing the laser pulse energy and counting the number of errors. The result is presented in figure 6. The first errors appeared for an energy of 760pJ incident on the backside of the DUT. From this value, one could estimate an equivalent ion LET threshold by knowing accurately the thickness and doping of the substrate [12]. In this work, we focused on the relative variation so this energyLET calibration was not required. The total number of errors increases with the energy and does not present a saturation trend. This could be explained by the limited range of energy that was explored (less than a decade), by the increased contribution of the Gaussian wings of the wide laser spot, by the importance of diffusion in this technology and by the contribution of different structures.

50

Table 4: average distribution of the erroneous bits within the CLB tiles vs. laser energy

Average number of errors

40

Type of bit Total

Logic

Interconnection

22nd frame

0.76

1

0

0

1

0.81

1

0

0

1

1.05

2

0

0

2

1.31

2

0

0

2

1.54

8.4

0

0

8.4

1.68

16.5

0

0

16.5

1.73

18

0

0.67

17.33

30

Energy (nJ) 20

10

0

0,6

0,8

1,0

1,2

1,4

1,6

1,8

Incident laser pulse energy (nJ)

Figure 6 : influence of laser energy on the total number of upsets induced by the scan of a single line Table 3 illustrates the evolution of the number of errors and their distribution amongst the different types of elements in the FPGA. During the reported experiments, errors were limited to CLBs and BRAMs but this of course depends on the scanned area. Table 4 shows for the same campaign the repartition of the errors inside the CLB tiles, with respect to the functionality of the bits. These results may tend to indicate a higher sensitivity of the bits in the 22nd configuration frame of the CLB tile, but Table 2 shows that this is probably due to the area of the chip scanned during this particular campaign.

presented in table 5. The design with all flip-flops used and set to 0 exhibits the highest sensitivity with a number of errors twice the one obtained with the adder design. These results clearly indicate the dissymmetry in term of sensitivity between the two logical states of the configuration bits. This implies that the sensitivity of an application implemented in this device is strongly dependent on the design details and on the way the resources are used. Table 5: impact of the design on the number of errors Design

Empty

All FF to 0

All FF to 1

200bit adder

Nb errors

78

280

240

135

Table 3: average distribution of the erroneous bits vs. laser energy Element type CLB

CLBE

GCLK

IOB

IOI

BRAM

BRAMI

0.76

1

0

0

0

0

0

0

0.81

1

0

0

0

0

0

0

1.05

2

0

0

0

0

0

0

1.31

2

0

0

0

0

1

0

1.54

8.4

0

0

0

0

7.6

0

1.68

16.5

0

0

0

0

16.5

0

1.73

18

0

0

0

0

25.3

0

Energy (nJ)

C. Influence of design We scanned the same rectangular area of 0.25x1mm² with four different designs implemented in the FPGA. The first design is completely empty. Two designs instanciated all the flip-flops with their input and output set to 0 or 1, respectively. The last design implements a 200-bit adder that occupies 100 slices in one corner of the device. The scanned area was defined in order to start from the same corner. The resulting numbers of errors are

This is typically the kind of results that can be obtained by using a laser testing platform and that are precious as fault model inputs for estimating the vulnerability of a design using simulation based faultinjection tools.

V. CONCLUSIONS We have presented the development of a test platform and methodology for evaluating the sensitivity of SRAM-based FPGA designs to pulsed laser fault injection. This approach is particularly interesting for validating design hardening strategies. The platform was successfully used for testing a Xilinx Virtex 2 through the backside. An improved software interface currently under development will allow automated mapping of different types of errors. Moreover, the platform is ready for remote testing through the internet. Future work will include the extension of this methodology to dynamic testing, as well as its implementation and adaptation on the industrial laser facility of EADS CCR to be used in the context of aeronautic and space applications of FPGAs.

ACKNOWLEDGEMENTS The authors acknowledge Alexandre Bocquillon and Nadine Buard at EADS CCR for their contribution to this work. This collaborative work is partly supported by the French Ministry of Research, through the project ACISI VENUS. The ATLAS laser facility is supported by the Region Aquitaine.

VI. REFERENCES [1]

[2]

[3]

[4]

M. Alderighi, A. Candelori, F. Casini, S. D’Angelo, M. Mancini, A. Paccagnella, S. Pastore, G. Sechi, « SEU Sensitivity of Virtex Configuration Logic », IEEE Trans. Nucl. Sci., vol52, no. 6, pp. 2462-7, 2005. M. Alderighi, F. Casini, S. D'Angelo, M. Mancini, A. Marmo, S. Pastore, G. R. Sechi, "A tool for injecting SEU-like faults into the configuration control mechanism of Xilinx Virtex FPGAs", 18th IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems, Boston, Massachusetts, USA, November 3-5, 2003, pp. 71-78 M. Alderighi, A. Candelori, F. Casini, S. D'Angelo, M. Mancini, A. Paccagnella, S. Pastore, G. R. Sechi, "Heavy ion effects on configuration logic of Virtex FPGAs", 11th IEEE International OnLine Testing symposium, Saint Raphaël, France, July 6-8, 2005, pp. 49-53 M. Bellato, P. Bernardi, D. Bortolato, A. Candelori, M. Ceschia, A. Paccagnella, M. Rebaudengo, M. Sonza Reorda, M. Violante, P. Zambolin, "Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA", Design,

Automation and Test in Europe Conference (DATE), February 1620, 2004, pp. 584-589 [5] K. Morgan, M. Caffrey, P. Graham, E. Johnson, B. Pratt, M. Wirthlin, “SEU-induced persistent error propagation in FPGAs”, IEEE Trans. Nucl. Sci., vol52, no. 6, pp. 2438-45, 2005. [6] F.L. Kastensmidt, L. Sterpone, L. Carro, M.S. Reorda, “On the optimal design of triple modular redundancy logic for SRAM-based FPGAs”, Proc. Of Design, Automation and Test in Europe (DATE) 2005, vol. 2, pp. 1290-5, 2005. [7] C. Kinzel Filho, F. Lima Kastensmidt, L. Carro, "Improving reliability of SRAM-based FPGAs by inserting redundant routing", 8th European Conference on Radiation and its Effects on Components and Systems (Radecs 2005), Cap d'Agde, France, September 19-23, 2005 [8] F. Faure, P. Peronnard, R. Velazco, R. Ecoffet, “THESIC+, a flexible system for SEE testing”, Proc. of RADECS 2002 Workshop, pp. 231-234, 2002. [9] V. Pouget, D. Wan, P. Jaulent, A. Douin, D. Lewis, P. Fouillat, “Recent developments for SEE testing at the ATLAS laser facility”, Proc. of 15th Single-Event Effects Symposium, 2006. [10] V. Pouget, P. Fouillat, D. Lewis, “Using the SEEM software for SET testing and analysis”, Radiation effects in embedded systems, to be published by Springer, 2006. [11] C. Kinzel Filho, F. Lima Kastensmidt, L. Carro, "Mapping the Virtex customization bits with JBits classes for selective bitstream fault injection", 6th Latin-American Test Workshop (LATW), March 30-April 2, 2005, pp. 97-102 [12] V. Pouget, H. Lapuyade, P. Fouillat, D. Lewis, S. Buchner, “Theoretical Investigation of an Equivalent Laser LET”, Microelectronics Reliability, vol 41 (9-10) , pp 1513-1518, 2001.

Suggest Documents