of transient errors occurring in
play

of Transient Errors Occurring in Processor-based Digital - PowerPoint PPT Presentation

A Software Approach for the Detection of Transient Errors Occurring in Processor-based Digital Architectures: Principles and Experimental Results Dr. Raoul Velazco Director of Researches at CNRS Co-leader of ARIS (Architectures Robust of


  1. A Software Approach for the Detection of Transient Errors Occurring in Processor-based Digital Architectures: Principles and Experimental Results Dr. Raoul Velazco Director of Researches at CNRS Co-leader of ARIS (Architectures Robust of Integrated circuits and Systems) TIMA Labs Grenoble-France Raoul Velazco – TIMA - ARIS 1

  2. Outline • Introduction • State of the Art • Methodology for error detection • Formal evaluation of proposed error detection technique • Automatic Generation of Hardened Programs • Experimental Results • Conclusion and Perspectives Raoul Velazco – TIMA - ARIS 2

  3. Context & Motivation - 1/2 • Miniaturization due to the constant improvements achieved in microelectronics technology – Increased sensitivity to the environment effects (i.e. radiation, EMC, temperature) • Processors operating in space are subject to different radiation phenomena: – Permanent: Dose effects • caused by the cumulated charges trapped in the oxide – Transient: SEE ( Single Event Effects ) • caused by the impact of a charged particle with sensitive area of a integrated circuit Raoul Velazco – TIMA - ARIS 3

  4. Context & Motivation - 2/2 • SEL Single Event Latchup - provoking short-circuits between power supply and ground – destructive, if is not detected at time • SEU Single Event Upset - provoking unexpected modification of memory cell’s content – non destructive – depend on the nature of the perturbed cell & the time occurrence • SEU’s effects – incorrect computation – system crash Raoul Velazco – TIMA - ARIS 4

  5. Objective & Contributions Objective – SIFT ( Software Implemented Fault Tolerant ) Technique • Efficient - high error detection capacity • Generic - no hardware dependent • Automatisable - fast generation of hardened applications • Application Domain - high level SW specification Contributions – Improvement of an existing SIFT technique – Automatic Flow to Generate Hardened Applications – Validation for different applications on several processors • Fault injection experiments • Radiation Campaigns Raoul Velazco – TIMA - ARIS 5

  6. Outline • Introduction • State of the Art • Methodology for error detection • Formal evaluation of proposed error detection technique • Automatic Generation of Hardened Programs • Experimental Results • Conclusion and Perspectives Raoul Velazco – TIMA - ARIS 6

  7. State of the Art • Hardware approaches : – hardware implementation of detection mechanism • Hardware/Software approaches : – hardware and software implementation of detection mechanism • Software approaches : – software implementation of detection mechanism Raoul Velazco – TIMA - ARIS 7

  8. Hardware Approaches • Design Hardening: modification of the design by suitable techniques to allow the manufacturing of the reliable circuits – Logic Gates – Hardened Memory Cells • Error Correction Code: adding dedicated circuits for error detection or/and correction for memory cells – Hamming Code – CRC Code • Limitations: – need hardware modification – not systematic – expensive Raoul Velazco – TIMA - ARIS 8

  9. SW/HW approaches • Recovery Block: – a primary software module – alternative software modules (having the same functionality than primary module) – an acceptance test • N Version Programming : – N versions of the same application • running in parallel – a voter to decide the correct output • the majority of the outputs • Limitations: – need hardware channels – application dependent Raoul Velazco – TIMA - ARIS 9

  10. Software Approaches • ABFT (Approach Based on Fault Tolerant): well suited for application using regular expressions • Assertions: insertion of the logic statements at different points of the program • Control Flow Checking: based on signature analysis – program is decomposed in free-branch blocks – online check of the signature with a golden one (pre-computed) • PdT Error Detection Technique: – transformation rules for error detection introducing redundancy at: • data segment • program code Raoul Velazco – TIMA - ARIS 10

  11. Outline • Introduction • State of the Art • Methodology for error detection • Formal evaluation of proposed error detection technique • Automatic Generation of Hardened Programs • Experimental Results • Conclusion and Perspectives Raoul Velazco – TIMA - ARIS 11

  12. Proposed Approach • Purely Software Approach based on a set of transformation rules • The set of rules is issued from: • PdT Error Detection Technique • The proposed set of rules allowing: • Improvement of error detection capacity • Reduction of time penalty & the memory space overhead • Three Set of Rules are applied to the target program: • Data Duplication - targeting errors affecting data • Global Execution Flow - targeting errors affecting basic instructions • Branching Duplication - targeting errors affecting control instructions Raoul Velazco – TIMA - ARIS 12

  13. Basic Concepts • Program: set of specific operations executed by a processor • Characteristic elements for a program: – Data: • input • intermediaries • output – Instructions: • basics: they not change the execution flow – logic operations (i.e. OR, AND, XOR, NOT) – arithmetic operations (i.e. addition, multiplication, division) – data transfer (i.e. MOV Reg,Mem and vice-versa) • control: allow modification of the execution flow – conditional (i.e. test instructions) – unconditional (i.e. calls and returns from the procedures) Raoul Velazco – TIMA - ARIS 13

  14. PdT - Data Duplication • Every variable is duplicated • Every write operation performed on the original variable is repeated for its replica • After each read operation the variables, a consistency check is introduced between the value of two variables (original and duplicated) • Limitation: – output variables are not checked for consistency • errors may not be detected – time and memory overhead increase direct proportionally with complex operations Raoul Velazco – TIMA - ARIS 14

  15. Improved Rules - Data Duplication • Identification of the relationships between the variables • Classification of the variables according to their role in the program • intermediary variables : they are used for calculation of other variable • final variables : they do not take part in calculation of any other variable • Every variable is duplicated • Every write operation performed on the original variable is repeated for its replica • After any write operation on a final variable , a consistency check is introduced between the value of two variables (original and duplicated) Legend • Added rules • Modified rules Raoul Velazco – TIMA - ARIS 15

  16. Data Duplication - Example Example of program Applying the proposed rules a = b + 2 a1 = b1 + 2 c = a + b*6 a2 = b2 + 2 c1 = a1 + b1*6 c2 = a2 + b2*6 2 b if(c1 != c2) error() 6 + a * a and b are intermediary variables c is final variable + c Raoul Velazco – TIMA - ARIS 16

  17. PdT- Global Execution Flow • An integer value k i is associated with every block i in the code • A global execution check flag gef variable is defined • A statement assigning to gef the value of k i is introduced at the beginning of every block i • A test on the value of gef is also introduced at the end of the block • Limitation: – incorrect jumps into the same block are not detected – incorrect jumps to the beginning of another block are not detected – abnormally “reset” of the application are not detected Raoul Velazco – TIMA - ARIS 17

  18. Improved Rules - Global Execution Flow • Identification of the maximum size blocks in the program • Decomposition of maximum size blocks according to the the number of instructions and the instruction’s complexity (computation volume) in basic blocks • A global execution check flag gef is defined, in order to associate an identification of each basic bloc • A boolean variable status_block is defined • An integer value k i is associated with every basic block i • A statement assigning to gef the value of k i XOR status_block is introduced at the beginning of every basic block i • A test on the value of gef is also introduced at the end of each basic block Legend • Added rules • Modified rules Raoul Velazco – TIMA - ARIS 18

  19. Global Execution Flow - Example Example of program Applying the proposed rules gef = 1 ^(status_block ^= 1) // ki = 1 i = 10 i = 10 i = 10 j = 2 j = 2 j = 2 k = 3 k = 3 k = 3 n = (i + j*6)/(k + 6) if(gef != 1 && status_block ^= 1) error() n = (i + j*6)/(k + 6) m = i*5 - j*6 gef = 2 ^(status_block ^= 1) // ki = 2 m = i*5 - j*6 Goto Label n = (i + j*6)/(k + 6) n = (i + 4)*k + j/5 m = i*5 - j*6 m = n*3 + i*j if(gef != 2 && status_block ^= 1) error() k = i + 3 Goto Label Label: gef = 3 ^(status_block ^= 1) // ki = 3 i = 2 n = (i + 4)*k + j/5 n = (i + 4)*k + j/5 if(gef != 3 && status_block ^= 1) error() m = n*3 + i*j gef = 4 ^(status_block ^= 1) // ki = 4 k = i + 3 m = n*3 + i*j k = i + 3 if(gef != 4 && status_block ^= 1) error() Label: gef = 5 ^(status_block ^= 1) // ki = 5 i = 2 if(gef != 5 && status_block ^= 1) error() Raoul Velazco – TIMA - ARIS 19

Recommend


More recommend