Improving FPGA Reliability in Harsh Environments Using Triple ...
Recommend Documents
D. Wagner discussed terminology for directions of helical twist in plant parts. W. Gall, R. ..... hydroid "leaf trace" connecting the leaf and the stem strands. In ..... Long-elliptical propagula of species of Streptopogon ...... Nordic Mo. 2: 102, 1
Due to active ballasting, the LNGC also operates at a constant draught of abt. .... start of cargo transfer; mooring winches and heave compensation systems act ...
Oct 25, 2001 - 20.000 km are often affected by forces like solar winds and earth ... indicates the importance of a GPS receiver performance analysis .... the central system synchroniser as well as a navigation and signal monitoring system.
aNuclear AMRC, Advanced Manufacturing Park, The University of Sheffield, Brunel Way, Rotherham, S60 5WG. * Corresponding author: Tel.: +441142229908 ...
Approach- A Review. Aasia Quyoum ... Software reliability is the probability of the failure free operation ... factors a
Feb 18, 2012 - Cellulolytic bacteria from soils in harsh environments. Fábio Lino Soares Jr. ⢠Itamar Soares Melo â¢. Armando Cavalcante Franco Dias â¢.
type of unit to evaluate all the resources consumed to generate food or non-food .... The proportion of renewable resources used in extensive dairy systems.
bryophyte communities were systematically sampled in 19 representative juniper forests, for the ... Epiphytic bryophyte communities on this species were highly.
UMR 7204, 75005, Paris, France,. 5Mathematical Ecology Research Group,. Department of Zoology, University of Oxford,. Oxford OX1 3PS, UK, 6Mus´eum ...
engineering company that has been involved in many projects for the ... Figure 1: Illustration of the new proposed offshore LNG transfer system with the LNGC ..... The two software packages ANSYS AQWA and WAMIT have been applied ...
Design. Qualification. Performance over Harsh. Environment. Manufacturing ... exists is not beneficial to the vendor since it will not produce the business it would ... saying that âyou can get a cheap job done, but it won't be good, or you can get
In addition, the stresses placed on turbine blades on a root- fastener connection are significant. Xylan deliv- ers bene
distance learning courses is the lack of a tool to assist in monitoring students' .... of the activities of the course "Applied Computer Scienceâ. In the simulation was ...
made to two well-known interfaces: fly, including support to ... by Perlin and Fox [10] and Bederson et al. ...... [5] R. de Sousa Rocha and M. A. F. Rodrigues. An.
Aug 30, 2004 - crease vs. a standard BLE4. The ALM structure is one of a number of archi- tectural improvements giving Altera's 90nm Stratix II architecture a ...
that a 4-LUT provides the best area-delay product. .... This terminology is necessary in order to account for area later
H. Cheng, Xiaobo Li, âPartial encryption of compressed images and videosâ, ... X. Li, J. Knipe, and H. Cheng, âImage compression and encryption using tree ...
input sharing and fracturability we are able to get the advantages of larger LUT sizes ... ther improvements built on th
Faculty of Computer Science and Information Technology, University Putra ... some applications of triple play services (i.e., voice, video and data) which are ...
Oct 20, 2015 - than the unprotected gold coating with similar reflective properties. ..... Balzers GmbH, Eschborn, Germany (3) Georg Böttcher and Silvia ...
Lessons from school visits and community dialogue in Northern Jordan. Svein Erik Stave, Ã ge A. Tiltnes, Zainab ... Cont
Feb 10, 2012 - The light exiting the mask is mostly coupled into the resulting ±1 .... where the optical fiber is exposed to high pressure hydrogen gas at room.
EPROM, EEPROM, Flash, and more recent Ferroelectric non-volatile memory technology. ... EEPROM (electrically erasable programmable read-only memory).
Harris, T. B. & Rajakaruna, N. (2009) Adiantum viridimontanum, Aspidotis densa, Minuartia marcescens, and Symphyotrichum rhiannon: Additional serpentine ...
Improving FPGA Reliability in Harsh Environments Using Triple ...
Brigham Young University. Los Alamos National Laboratory. Improving ... Brian Prattâ , Michael Wirthlinâ . Michael Caffreyâ¡, Paul Grahamâ¡, Keith Morganâ¡, ...
Improving FPGA Reliability in Harsh Environments using Triple Modular Redundancy with More Frequent Voting Brian Pratt† , Michael Wirthlin† Michael Caffrey‡, Paul Graham‡, Keith Morgan‡, Heather Quinn‡ Severn Shelley † ‡ †
Brigham Young University ‡ Los Alamos National Laboratory
This work was supported by the Department of Energy at Los Alamos National Laboratory. Approved for public release under LA-UR-07-7800; distribution is unlimited.
Los Alamos National Laboratory
Brigham Young University
Outline 1. 2. 3. 4. 5. 6. 7.
Triple Modular Redundancy Reliability Models for TMR Systems TMR with More Frequent Voting Reliability Analysis of More Frequent Voting Test Designs and Methodology Simulation Results Conclusions
Los Alamos National Laboratory
Brigham Young University
FPGA Fault Tolerant Strategy •
FPGAs provide SEU mitigation through redundancy and scrubbing
•
Triple Modular Redundancy (TMR) – Triplicate module to introduce redundancy – Vote on outputs of triplicated module – Use greatest common result
•
A A
A
V
A
Configuration Scrubbing – – – –
Readback frame data Compare frame to original Correct erroneous bits in frame Writeback frame to FPGA
Los Alamos National Laboratory
Brigham Young University
Triple Modular Redundancy (TMR) • Three copies of each circuit module • Majority voters select circuit output from redundant modules • An upset affecting a single domain will not cause an error
Los Alamos National Laboratory
Brigham Young University
Reliability model for TMR with repair • Markov model of TMR with repair – State 0: All three modules functioning – State 1: One of three modules has failed – State 2: Two or more modules have failed (TMR failure)
• A second upset before scrubbing can repair may cause TMR to fail 1 − (2λ + μ )
1− 3λ
0
3λ
1
1 2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1− 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
TMR Markov Model Example
1 − (2λ + μ )
1 − 3λ
0
3λ
1
1
2λ
2
μ Los Alamos National Laboratory
Brigham Young University
Reliability of TMR with repair • Different combinations of upset rate and repair (scrubbing) rate result in distinct reliability curves • The upset rate/repair rate ratio determines the reliability of the system • Faster scrub rates needed for harsher radiation environments
Los Alamos National Laboratory
Brigham Young University
TMR with More Frequent Voting (MFV) • • • • •
Partition design into smaller modules Separate partitions with voters Each partition is isolated from others Each partition has lower probability of failure Combined probability of failure is also lower
Los Alamos National Laboratory
Brigham Young University
TMR with More Frequent Voting (MFV) • Our analysis and experiments address the effects of multiple independent upsets (MIUs) – Due to harsh radiation environments
• Failures due to multiple-bit upsets (MBUs) are not addressed here • Previous work has addressed single-bit domain crossing upsets in relation to TMR with partitions – F. Lima Kastensmidt, L. Sterpone, L. Carro, M. Sonza Reorda, "On the Optimal Design of Triple Modular Redundancy Logic for SRAMbased FPGAs,” pp. 1290-1295, 2005.
Los Alamos National Laboratory
Brigham Young University
Reliability model for MFV TMR • Markov model of TMR with two partitions – – – –
State 0: All three modules functioning State 1: One module in one partition has failed State 2: One module in each partition has failed State 3: Two or more modules in the same partition have failed (TMR failure)
• A second upset in the same partition may cause TMR λ 2 to fail 2 6
0
λ
3
2
1
λ 2
4
2
λ 2
3
μ μ
Los Alamos National Laboratory
1
Brigham Young University
Reliability of MFV TMR – Two partitions • TMR with two partitions is more reliable than standard TMR (one partition)
Los Alamos National Laboratory
Brigham Young University
Multiple Partitions • In general, more partitions will increase overall reliability
Los Alamos National Laboratory
Brigham Young University
More frequent voting study • Goals of this study – Evaluate effectiveness of TMR with more frequent voting on FPGA technology
• Use fault injection to evaluate reliability – Insert various numbers of faults before evaluating and repairing • To simulate different upset rate/repair rate ratios
• Question: How does reliability improve?
Los Alamos National Laboratory
Brigham Young University
Test Design #1 • 32-bit shift register designed to fill target device – 100 registers deep
• Tested on Xilinx Virtex 1000 device
Los Alamos National Laboratory
Brigham Young University
Test Design #2 • 32-bit shift register designed to fill target device – 250 registers deep – Shift register at each bit position affects all others due to xor operations • To prevent isolation between bit positions
• Tested on Xilinx Virtex 4 SX55 device
Los Alamos National Laboratory
Brigham Young University
Test Design with more frequent voting • Partitions separated with triplicated voters • Multiple design points with different numbers of partitions
Insert voters Los Alamos National Laboratory
Brigham Young University
Test Methodology • Two test platforms – SLAAC-1V with Xilinx Virtex 1000 – SEAKR board with Xilinx Virtex 4 SX55
• Experiment with different numbers of partitions – Several design points created from each test design
• Evaluate each design point with multiple upset rates – Simulating different upset rate to repair rate ratios
• Measure results with fault injection
Los Alamos National Laboratory
Brigham Young University
Fault Injection Results • Test Design #1 on Virtex 1000 • Reliability increased with more partitions
Los Alamos National Laboratory
Brigham Young University
Fault Injection Results • Test Design #1 on Virtex 1000
Los Alamos National Laboratory
Brigham Young University
Fault Injection Results • Test Design #2 on Virtex 4 SX55
Los Alamos National Laboratory
Brigham Young University
Summary of Results • Results for V4 design showed that reliability decreased as partitions were added at first – Reliability increased after 25 partitions for this design – Routing may have had an impact • Routing behavior changes with density of device
– More domain-crossing single-bit failures due to larger design size?
• Reliability improvements are modest • Largest gains in reliability for larger number of upsets per scrub cycle
Los Alamos National Laboratory
Brigham Young University
Natural Partitioning • Some partitioning naturally occurs with standard TMR • Feedback TMR – Voters inserted in feedback loops, creating partitions
• Parallel streams of logic – Portions of logic that do not affect each other – e.g. bits in a simple shift register
Los Alamos National Laboratory
Brigham Young University
Conclusions • TMR with more frequent voting increases reliability in the presence of multiple upsets • Reliability gains are not large for low upset/scrub rates • Future work: – – – –
Evaluate cost vs. benefit Understand architectural differences Radiation testing Evaluate effects of natural partitioning