new developments in error detection and correction
play

New Developments in Error Detection and Correction Strategies for - PowerPoint PPT Presentation

New Developments in Error Detection and Correction Strategies for Critical Applications Melanie Berg, AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov Ken LaBel, NASA/GSFC To be presented by Melanie D. Berg at the Single Event Effects


  1. New Developments in Error Detection and Correction Strategies for Critical Applications Melanie Berg, AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov Ken LaBel, NASA/GSFC To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 1

  2. Acronyms • Application specific integrated circuit (ASIC) • NASA Electronic Parts and Packaging (NEPP) • Block random access memory (BRAM) Negative doped with electrons (N + ) • • Block Triple Modular Redundancy (BTMR) • Operational frequency ( fs) • Clock (CLK or CLKB) • Power on reset (POR) • Combinatorial logic (CL) • Place and Route (PR) • Computer aided design (CAD) • Positive doped with holes (P + ) • Configurable Logic Block (CLB) • Radiation Effects and Analysis Group (REAG) • Configuration cross section (P configuration ) • Single event functional interrupt (SEFI) • Digital Signal Processing Block (DSP) • Single event functional interrupt cross section (P SEFI ) • Distributed triple modular redundancy (DTMR) • Single event effects (SEEs) • Dual interlocked cell (DICE) • Single event latch-up (SEL) • Dual redundancy (DR) • Single event transient (SET) • Edge-triggered flip-flops (DFFs) • Single event transient cross section (P SET  SEU ) • Equivalence Checking (EC) • Single event upset (SEU) Error detection and correction (EDAC) • Single event upset cross-section ( σ SEU ) • • Field programmable gate array (FPGA) Static random access memory (SRAM) • • Finite state machine (FSM) • System cross section (P(fs) error ) • Flip-flop SEU cross section (P DFF  SEU ) • System on a chip (SOC) • Functional logic cross section (P functionalLogic ) Time delay ( τ dly ) • • Gate Level Netlist (EDF, EDIF, GLN) • Temporal redundancy (TR) • Global triple modular redundancy (GTMR) • Total ionizing dose (TID) • Hardware Description Language (HDL) • Voltage connected to positive rail (V DD ) • Input – output (I/O) • Voltage connected to ground rail (V SS ) • Linear energy transfer (LET) • Windowed shift register (WSR) • Local triple modular redundancy (LTMR) • Look up table (LUT) • Mean Fluence to failure (MFTF) • Mean Time to Failure (MTTF) To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 2

  3. Agenda • Single Event Upsets (SEUs) in Digital Devices. • Single Event Upsets and FPGA Configuration. • Single Event Upsets in FPGA Data Paths. • Fail-Safe Strategies for Critical Applications. • Dual Redundancy: – Lockstep and – Separate systems. • Cold Sparing. • Triple modular redundancy (TMR): – Block TMR (BTMR), – Local TMR (LTMR), – Distributed TMR (DTMR), and – Global TMR (GTMR). • Fail-Safe State Machines. To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 3

  4. SEUs in Digital Devices Although there are many sources of FPGA malfunction, this presentation will focus on SEUs as a source of failure. ionization Single event transient: SET If an SET gets caught by a memory element, then it becomes an SEU To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 4

  5. SEUs versus Total Ionizing Dose (TID) • The two are commonly confused. – TID is dose that can cause device failure from exposure to ionizing particles (mostly protons and electrons) over time. – SETs and SEUs have nothing to do with dose over time. • One particle’s passage through a sensitive region of a device. • Causes ionization and can cause a transistor to change it’s state. To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 5

  6. How SEUs Affect FPGAs • SEU and SET error signatures vary between FPGA devices: – Temporary glitch (transient), – Change of state (in correct state machine transitions), – Global upsets: Loss of clock or unexpected reset, – Route breakage (no signal can get through), and – Configuration corruption. • The question is how to avoid system failure and the answer depends on the following: – The system’s requirements and the definition of failure, – The target device and its surrounding circuitry susceptibility, – Implemented fail-safe strategies, – Reliable design practices, – Radiation environment, and – Trade space and decided risk. To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 6

  7. FPGA SEU Categorization as Defined by NASA Goddard REAG: Probabilities are with respect to fluence (SEU cross sections σ SEU ) System σ SEU Configuration σ SEU SEFI σ SEU Functional logic σ SEU Sequential and Global Routes Combinatorial and Hidden logic (CL) in Logic data path SEU Testing is required in order to characterize the σ SEU s for each of FPGA categories. To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 7

  8. Preliminary Design Considerations for Mitigation And Trade Space Determine Most Susceptible Components: • Does the designer need to add mitigation? • Will there be compromises? – Performance and speed, – Power, – Schedule – Mitigating the susceptible components? – Reliability (working and mitigating as expected)? Impact to speed, power, area, reliability, and schedule are important questions to ask. To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 8

  9. Single Event Upsets and FPGA Configuration P configuration +P(fs) functionalLogic +P SEFI To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 9

  10. Programmable Switch Implementation and SEU Susceptibility ANTIFUSE (one time programmable) SRAM (reprogrammable) To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 10

  11. Configuration SEU Test Results and the REAG FPGA SEU Model ( ) ∝ + + P fs P P ( fs ) P Configurat ion functional Logic SEFI error FPGA REAG Model Configuration Type ( ) Antifuse ∝ + ( ) P fs P fs P error functional Logic SEFI ( ) ∝ SRAM (non- P fs P error Configurat ion mitigated) ( ) Flash ∝ + ( ) P fs P fs P error functional Logic SEFI ( ) Hardened SRAM ∝ + + ( ) P fs P P fs P error Configurat ion functional Logic SEFI To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 11

  12. What Does The Last Slide Mean? FPGA Susceptibility Data-path: Combinatorial Logic (CL) and Flip-flops (DFFs); Configuration Global: Clocks and Resets; Type Configuration Antifuse Configuration has been designated as hard regarding SEEs. Susceptibilities only exist in the data paths and global routes. However, global routes are hardened and have a low SEU susceptibility. SRAM (non- Configuration has been designated as the most susceptible portion mitigated) of circuitry. All other upsets (except for global routes) are too statistically insignificant to take into account. E.g., it is a waste of time to study data path transients, however clock transient studies are significant. Flash Configuration has been designated as hard (but NOT immune) regarding SEEs. Susceptibilities also exist in the data paths and global routes (e.g., clocks and resets). Hardened Configuration has been designated as hardened (but NOT hard) SRAM regarding SEEs. Susceptibilities also exist in the data paths and global routes (e.g., clocks and resets). To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 12

  13. Example: Routing Configuration I 1 I 2 I 3 I 4 Upsets in a Xilinx Virtex FPGA Look Up Table: LUT R O U I 1 I 2 I 3 I 4 T I SET D Q I 1 I 2 I 3 I 4 N LUT G M A Q CLR T R I X LUT LUT Because multiple paths can pass through the routing matrix, this configuration can be catestrophic – i.e., break simple mitigation To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017. 13

Recommend


More recommend