eliminating single points of failure in software based
play

Eliminating Single Points of Failure in Software Based Redundancy - PowerPoint PPT Presentation

Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich , Martin Ho ff mann, Rdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schrder-Preikschat EDCC May 9, 2012 SYSTEM SOFTWARE GROUP


  1. Eliminating Single Points of Failure in 
 Software ‐ Based Redundancy Peter Ulbrich , Martin Ho ff mann, Rüdiger Kapitza, Daniel Lohmann, 
 Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM SOFTWARE GROUP http://www4.cs.fau.de

  2. Transient Hardware Faults – A Growing Problem [3] (Shivakumar, 2002) ■ Transient hardware faults (Soft-Errors) ! ■ Induced by e.g., radiation, glitches, insu ffi cient signal integrity ■ Increasingly a ff ecting microcontroller logic ■ Future hardware designs: 
 Even more performance and parallelism 
 � On the price of being less and less reliable ! Peter Ulbrich – ulbrich@cs.fau.de 2

  3. Countermeasures - Hardware Safety-Critical System ! Actuators ' Actuators ' Sensors' Safety'Cri+cal.Applica+on. ■ Hardware-based countermeasures ! ■ Application-specific design or specialised hardware ■ For example ECC, lock-step ! Pragmatic approach (tackles problem right at source) � Hardware costs (e.g., redundancy, checker, …) � Selectivity (e.g., multi-application systems) � Development costs (diverse safety concepts and HW, (re-)certification) Peter Ulbrich – ulbrich@cs.fau.de 3

  4. Countermeasures - Software Safety-Critical System ! Safety'Cri+cal.Applica+on. ↯ Actuators ' Sensors' ■ Different approaches to address transient hardware faults ! ■ Hardware vs. software measures ■ Applicability and costs ! Peter Ulbrich – ulbrich@cs.fau.de 4

  5. Countermeasures - Software ↯ Safety-Critical System ! ✗ Safety'Cri+cal.Applica+on.(1). Actuators ' Sensors' Safety'Cri+cal.Applica+on.(2). Safety'Cri+cal.Applica+on.(3). ■ Different approaches to address transient hardware faults ! ■ Hardware vs. software measures ■ Applicability and costs ■ Software-based triple modular redundancy (TMR) ! ■ Accepted and proven (e.g., recommended for ASIL D error handling) ■ Selective (e.g., multi-application systems) Peter Ulbrich – ulbrich@cs.fau.de 4

  6. Software-Based Redundancy in Detail Safety-Critical System ! Replica.1. Majority. Sensors' Interface. Replica.2. Actuators' Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ! Peter Ulbrich – ulbrich@cs.fau.de 5

  7. Software-Based Redundancy in Detail Safety-Critical System ! P = 998 1000 1 1 P = P = 1000 1000 ↯ ↯ Replica.1. Majority. Majority. Actuators ' Sensors' Interface. Interface. Replica.2. Actuators' Voter. Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ■ Single points of failur Single points of failure ! ■ No error detection ■ Very small � Certain probability Peter Ulbrich – ulbrich@cs.fau.de 5

  8. Software-Based Redundancy in Detail Safety-Critical System ! P = ? P = ? P = ? ↯ ↯ Replica.1. Majority. Majority. Actuators ' Sensors' Interface. Interface. Replica.2. Actuators' Voter. Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ■ Single points of failur Single points of failure ! ■ No error detection ■ Very small � Certain probability ■ Risk analysis ■ Inherently complex ■ Random error distribution? (Nightingale, 2011) Peter Ulbrich – ulbrich@cs.fau.de 5

  9. Software-Based Redundancy in Detail Safety-Critical System ! Replica.1. Majority. Sensors' Interface. Replica.2. Actuators' Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ■ Single points of failur Single points of failure ! ■ No error detection ■ Very small � Certain probability ■ Risk analysis ■ Inherently complex ■ Random error distribution? (Nightingale, 2011) Peter Ulbrich – ulbrich@cs.fau.de 5

  10. Agenda ■ Introduction ! ■ The Co Combined Red Redundancy Approach ! ■ Eliminating Vulnerabilities ■ High-Reliability Voters ■ Example: UAV Flight Control ! ■ CoRed Implementation ■ Target System: I4 Copter ■ Evaluation ! ■ Experimental Setup ■ Results ■ Conclusion ! Peter Ulbrich – ulbrich@cs.fau.de 6

  11. ��������� ��������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ��������� ����������� �������� ������ ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { TMR + ! 
 Peter Ulbrich – ulbrich@cs.fau.de 7

  12. ��������� ��������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ��������� ����������� �������� ������ ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { Data-flow encoding TMR + ! 
 Peter Ulbrich – ulbrich@cs.fau.de 7

  13. ��������� ��������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ��������� ����������� �������� ������ ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { Data-flow encoding TMR + High-reliability voters ! 
 Peter Ulbrich – ulbrich@cs.fau.de 7

  14. ����������� ��������� ��������� �������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ������ ��������� ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach 2 1 3 ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { Data-flow encoding TMR + High-reliability voters ■ Holistic Protection Approach ! ■ Input to output protection 
 1 Reading inputs � 2 Processing � 3 Distributing outputs ■ Composability � On application and system level Peter Ulbrich – ulbrich@cs.fau.de 7

  15. Eliminating Input and Output Vulnerabilities SOR' Y Y’ Y Encode. Decode. (Value)' (Encoded'Value)' X X’ X Encode. Decode. (Value)' (Encoded'Value)' ■ Inter-domain data-flow protection ! ■ Checksum vs. Arithmetic code (AN code) ■ AN Code � Encoded data operations ■ Enabler for high-reliability voter ! Peter Ulbrich – ulbrich@cs.fau.de 8

  16. Eliminating Input and Output Vulnerabilities SOR' Y Y’ Y Encode. Decode. (Value)' (Encoded'Value)' X X’ X Encode. Decode. (Value)' (Encoded'Value)' ■ Inter-domain data-flow protection ! ■ Checksum vs. Arithmetic code (AN code) ■ AN Code � Encoded data operations ■ Enabler for high-reliability voter ■ CoRed: Extended AN code Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989) Peter Ulbrich – ulbrich@cs.fau.de 8

  17. Eliminating Input and Output Vulnerabilities SOR' Y Y’ Y Encode. Decode. (Value)' (Encoded'Value)' X X’ X Encode. Decode. (Value)' (Encoded'Value)' A B X D (Prime)' (Signature)' (Timestamp)' ■ Inter-domain data-flow protection ! ■ Checksum vs. Arithmetic code (AN code) ■ AN Code � Encoded data operations ■ Enabler for high-reliability voter ■ CoRed: Extended AN code Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989) } X’ = X × A + B X + D ■ Data integrity: Prime ■ Address integrity: Per variable signature ■ Outdated data: Timestamp Peter Ulbrich – ulbrich@cs.fau.de 8

Recommend


More recommend