Hypervisor-Based Fault-Tolerance Thomas C. Bressoud, Isis - PowerPoint PPT Presentation

Hypervisor-‑Based ¡Fault-‑Tolerance ¡ Thomas ¡C. ¡Bressoud, ¡Isis ¡Distributed ¡Systems ¡ Fred ¡B. ¡Schneider, ¡Cornell ¡University ¡ ¡ SOSP, ¡1995 ¡

Problem ¡ • Fault ¡tolerance ¡is ¡subtle ¡and ¡oEen ¡hard ¡to ¡implement ¡ on ¡exisGng ¡layers ¡ • Hardware ¡Level ¡ImplementaGon ¡ – Design ¡cost, ¡design ¡oEen ¡lag ¡behind ¡ • OperaGng ¡System ¡Level ¡ImplementaGon ¡ – Difficult ¡to ¡idenGfy ¡state ¡transiGons ¡ – So ¡many ¡OSes ¡compare ¡to ¡the ¡number ¡of ¡arch ¡ • ApplicaGon ¡Level ¡ImplementaGon ¡ – Do ¡similar ¡coordinaGon ¡for ¡all ¡program ¡ – High ¡demanding ¡

Compare ¡With ¡Chain ¡ReplicaGon ¡ • Similarity: ¡ – Explored ¡trade-‑off ¡space ¡with ¡a ¡new ¡design ¡ – Based ¡on ¡ failstop ¡model ¡ • Difference: ¡ – Overall, ¡the ¡goal ¡of ¡two ¡paper ¡are ¡quite ¡different ¡ • Chain ¡replicaGon ¡focus ¡on ¡maintaining ¡system ¡performance ¡while ¡ providing ¡strong ¡consistency ¡ • This ¡paper ¡concerns ¡more ¡about ¡engineering ¡costs ¡versus ¡Gme-‑to-‑ market ¡costs ¡ – Another ¡one ¡is ¡what ¡kind ¡of ¡fault ¡to ¡tolerate ¡ • Chain ¡ReplicaGon ¡is ¡not ¡parGcularly ¡interested ¡in ¡any ¡certain ¡type ¡ of ¡fault ¡ • This ¡paper ¡focus ¡on ¡processor ¡fault ¡

SoluGon ¡ • Hypervisor ¡between ¡hardware ¡& ¡OS ¡ – One ¡per ¡instrucGon-‑set ¡architecture ¡ – Can ¡support ¡all ¡OSes ¡run ¡on ¡that ¡architecture ¡ – Frees ¡applicaGon ¡developer ¡ • Idea ¡ – Sequence ¡of ¡instrucGons ¡executed ¡by ¡two ¡virtual ¡machines ¡ running ¡on ¡different ¡physical ¡processors ¡are ¡idenGcal ¡ • Primary ¡and ¡Backup ¡based ¡on ¡state ¡machine ¡approach ¡ – Primary ¡makes ¡decision, ¡backup ¡takeover ¡if ¡primary ¡fail ¡ – State: ¡virtual ¡processor ¡state ¡ – Commands: ¡instrucGon ¡to ¡run ¡and ¡also ¡interrupt ¡(with ¡ data) ¡at ¡primary ¡to ¡deliver ¡

Technical ¡Detail ¡ • I/O ¡Accessibility ¡AssumpGon: ¡ • I/O ¡devices ¡are ¡shared ¡ • IdenGcal ¡Command ¡Stream ¡ ¡ – Ordinary ¡InstrucGon ¡& ¡Environment ¡InstrucGon ¡ • Eliminate ¡non-‑determinisGc ¡by ¡communicate ¡when ¡ execuGng ¡environment ¡instrucGon ¡ – Epochs ¡( ¡checkpoint ¡) ¡ • Use ¡register ¡count ¡to ¡periodically ¡invoke ¡hypervisor ¡ • Synchronize ¡VMs ¡by ¡sending ¡interrupts ¡received ¡by ¡ primary ¡replica ¡

Compare ¡With ¡Chain ¡ReplicaGon ¡ • Similar ¡idea ¡ – Both ¡paper ¡use ¡the ¡method ¡of ¡primary ¡and ¡backup ¡ • DifferenGate ¡cost ¡and ¡effect ¡of ¡operaGons ¡ – CommunicaGon ¡is ¡necessary ¡to ¡deal ¡non-‑determinism ¡ – For ¡Chan ¡ReplicaGon, ¡it ¡is ¡the ¡update ¡request ¡ ¡ – In ¡this ¡paper, ¡it ¡is ¡the ¡environment ¡instrucGon ¡ • Different ¡layer ¡and ¡mindset ¡ – Chain ¡ReplicaGon ¡works ¡on ¡applicaGon ¡layer ¡(top ¡down) ¡ – This ¡paper ¡implement ¡on ¡hypervisor ¡layer ¡which ¡mask ¡the ¡ hardware ¡failure ¡(bo\om ¡up) ¡

Performance ¡ • Increase ¡epoch ¡length ¡improves ¡performance ¡ – With ¡a ¡limit ¡of ¡385,000 ¡ • Epoch ¡length ¡has ¡less ¡effect ¡on ¡I/O ¡intensive ¡ workloads ¡ • Run ¡programs ¡about ¡a ¡factor ¡of ¡2 ¡slower ¡than ¡ a ¡bare ¡machine ¡would ¡ • Have ¡to ¡deal ¡with ¡virtual ¡memory ¡

Take ¡away ¡ • The ¡key ¡issue ¡of ¡replicated ¡state ¡machine ¡ – EliminaGng ¡non-‑determinism ¡ • The ¡costs ¡of ¡fault ¡tolerant ¡are ¡different ¡between ¡ ImplementaGon ¡layers ¡ – Hypervisor ¡level ¡implementaGon ¡ ¡ • Cost ¡of ¡performance ¡and ¡fault ¡coverage ¡ • Provides ¡transparent ¡fault ¡tolerance ¡and ¡OS ¡coverage ¡ – ApplicaGon ¡level ¡implementaGon ¡ ¡ • Requires ¡higher ¡level ¡idea ¡of ¡enGre ¡system ¡and ¡its ¡requirement ¡ • PotenGally ¡be\er ¡scalability ¡and ¡task ¡specific ¡opGmizaGon ¡ – How ¡do ¡we ¡choose? ¡ ¡

Hypervisor-Based Fault-Tolerance Thomas C. Bressoud, Isis - PowerPoint PPT Presentation

Hypervisor-Based Fault-Tolerance Thomas C. Bressoud, Isis Distributed Systems Fred B. Schneider, Cornell University SOSP, 1995 Problem Fault tolerance

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute Whats Left For

Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume

Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz Agenda Threats

Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault

Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate circuit 0/1 N gate

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time Replication and

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

No SQL? Image credit: http://browsertoolkit.com/fault-tolerance.png No SQL? Image credit:

Fibre bundle framework for unitary quantum fault tolerance Lucy Liuxuan Zhang University of

Towards an Efficient Fault-Tolerance Scheme for GLB Claudia Fohry, Marco Bungart and Jonas Posner

Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Class Overview Introduction

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Discover UEFI with U-Boot 2020-02-01, Heinrich Schuchardt CC-BY-SA-4.0 About Me

Storage: Disks & File Systems Thursday, 14 February 19 Overview (Mechanical) Disks Disk

Hands-On Ethical Hacking and Network Defense Second Edition Chapter 8 Desktop and Server OS

TDDD82 Secure Mobile Systems Lecture 5: Dependability Mikael Asplund Real-tjme Systems

Computer Systems Research Kexin Rong CS197 09/26/19 Agenda - Area overview - Introductions

An Approach to Manage Reconfiguration in Fault- Tolerant Distributed System s Stefano Porcarelli

Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults Dian Yu 1/16 Comparison with

THE RELIABLE COMPUTING BASE A Paradigm for Software-Based Reliability Michael Engel (TU

Hypervisor-Based Fault-Tolerance Thomas C. Bressoud, Isis - PowerPoint PPT Presentation

Hypervisor-Based Fault-Tolerance Thomas C. Bressoud, Isis Distributed Systems Fred B. Schneider, Cornell University SOSP, 1995 Problem Fault tolerance

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute Whats Left For

Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume

Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz Agenda Threats

Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault

Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate circuit 0/1 N gate

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time Replication and

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

No SQL? Image credit: http://browsertoolkit.com/fault-tolerance.png No SQL? Image credit:

Fibre bundle framework for unitary quantum fault tolerance Lucy Liuxuan Zhang University of

Towards an Efficient Fault-Tolerance Scheme for GLB Claudia Fohry, Marco Bungart and Jonas Posner

Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Class Overview Introduction

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Discover UEFI with U-Boot 2020-02-01, Heinrich Schuchardt CC-BY-SA-4.0 About Me

Storage: Disks &amp; File Systems Thursday, 14 February 19 Overview (Mechanical) Disks Disk

Hands-On Ethical Hacking and Network Defense Second Edition Chapter 8 Desktop and Server OS

TDDD82 Secure Mobile Systems Lecture 5: Dependability Mikael Asplund Real-tjme Systems

Computer Systems Research Kexin Rong CS197 09/26/19 Agenda - Area overview - Introductions

An Approach to Manage Reconfiguration in Fault- Tolerant Distributed System s Stefano Porcarelli

Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults Dian Yu 1/16 Comparison with

THE RELIABLE COMPUTING BASE A Paradigm for Software-Based Reliability Michael Engel (TU

Storage: Disks & File Systems Thursday, 14 February 19 Overview (Mechanical) Disks Disk