Elements of the Self-Healing System Problem Space Phil Koopman - PowerPoint PPT Presentation

Elements of the Self-Healing System Problem Space Phil Koopman Carnegie Mellon University WADS, May 2003 & Electrical Computer ENGINEERING

Overview “Self-Healing” – it’s getting attention, but what does it mean? ◆ • This talk is based on observations from the most recent Workshop on Self- Healing Systems (WOSS’02) Description of some general problem elements of Self Healing research ◆ • Fault models – what is an “injury”? • System responses – what is “healing”? • System incompleteness – what’s unknown? • Design context – what injuries are beyond healing? Two challenges: ◆ 1. Fault Tolerant Computing : broaden perspectives with SH ideas 2. Self Healing : don’t waste time reinventing existing FT ideas 2

Fault Model – “injury” ◆ First question in fault tolerant computing is: “What is the fault model?” ◆ Reasons for a fault model • Need to know expected faults to measure fault tolerance coverage • Not all faults are equal in time, space, severity ◆ Some challenges: • Is Injury == Fault ???? • Is a software defect an injury? 3

Self-Healing Fault Model Issues ◆ Fault duration: • Permanent / intermittent / transient ◆ Fault manifestation: • Fail silent / Byzantine / correlated faults • Impaired: run-time, reserve capacity, brittleness, resource consumption ◆ Fault source: • Wear-out / design defects / reqts. defects / environment change / malicious ◆ Granularity: • One designer’s “system” is the next level designer’s “component” • Transistor failure / … node failure … / system failure ◆ Fault profile expectations: • No faults / historically known faults / foreseen faults / unforeseen faults • Random+independent / random+correlated / expected / predicted 4

System Response – “healing” ◆ After an injury, what happens? ◆ Fault tolerant system responses include: • Diagnosis / identification • Isolation / containment • System reconfiguration • System reinitialization ◆ Does “healing” mean something additional? • Or is it a difference at a different level? 5

Self Healing System Responses Fault Detection: ◆ • Self-test / pairwise checking / peer checking / supervisor checking • Self-injected faults to ensure detection is working? Degradation during & after healing: ◆ • Fail-operational / degraded performance / fail-fast+ fail-safe Response: ◆ • Fault masking / failover / reconfiguration • Optimize for: safety / reliability / availability / … • Preventative (periodic reboot) / Proactive (diagnosis-based) / Reactive Recovery of state: ◆ • Hot swap / restore quiescent state / warm boot / cold boot • Rollback / recovery block / control gain changes / rollforward / run-while-reconfiguring • What about recovering component state? Time constants: ◆ • Most faults are transient • Important that system response time constant be faster than injury arrival rate System Assurance: ◆ • After injury / during healing / after healing 6

System Completeness – What do we know and when? ◆ System self-knowledge • How much self-knowledge is required for healing? • How should healing knowledge be abstracted? • How do we deal with not knowing how much the system doesn’t know? ◆ Designer knowledge • Not all systems are complete when design is “done” • Even if complete, we won’t know everything about all components • How do we deal with not knowing how much we don’t know? 7

Self Healing System Completeness ◆ Architectural Completeness: • Proprietary & known / open & regulated / extensible ◆ Designer Knowledge: • Component knowledge (especially COTS components) • Faulty behavior characterizations • How do you heal after suffering a component behavior that is “unspecified”? ◆ System Self-Knowledge: • How complete is system’s self-model? (idea of reflection) • Is healing an intentional or emergent behavior? ◆ System Evolution • Configuration changes & usage changes • Are outages random / predictable / schedulable? 8

Design Context – What are the scope limits? ◆ The real world is a messy place – what assumptions are made? • Homogeneous system? • “Perfect” components (e.g., perfect healing management software?) • … ◆ What is the size of the system? • A single software module? • A complex software system? • A person plus a computer system? • The North American power grid? • The Internet? • Does teaching users to press CTL-ALT-DEL achieve “self-healing” of the user+computer “system”? 9

Self Healing Design Context Abstraction Level: ◆ • Implementation / design / architecture / … Component Homogeneity: ◆ • Can any software component run in any node? • Perfect configuration homogeneity / plug-compatible / heterogeneous Predetermination of system behavior: ◆ • Specific design / rule-based system / service discovery / emergent behavior User Involvement in healing: ◆ • User direction / user-provided hints / user ability to tune / invisible to user System Linearity: ◆ • Linear+composable / monotonic / mildly discontinuous / arbitrary • Single operating mode / mode changes System scope: ◆ • Component / computer system / computer+person / enterprise / society 10 10

Conclusions “Self-Healing” potentially encompasses a lot of ground ◆ • Smaller than expected intersection of research assumptions at WOSS02 • Consensus will take a while Some of this has been done before! ◆ • Fault models – well known in FT, don’t reinvent without good reason • System responses – how different are they from FT? • System incompleteness – FT usually assumes relative completeness • Design context – plenty of room for novelty in both FT & SH • But there is plenty of room for more good research A final thought: ◆ 1. Fault Tolerant Computing : broaden perspectives with SH ideas 2. Self Healing : don’t waste time reinventing existing FT ideas even better: articulate the novelty of approaches 11 11

Elements of the Self-Healing System Problem Space Phil Koopman - PowerPoint PPT Presentation

Elements of the Self-Healing System Problem Space Phil Koopman Carnegie Mellon University WADS, May 2003 & Electrical Computer ENGINEERING Overview Self-Healing its getting attention, but what does it mean?

Title: Healing Class 103 Week 1 Healing 103 Week 1 Jesus Healing Individuals Part 5

Self Healing in Streaming Systems #UW Database Day Dec 2nd, 2016 Karthik Ramasamy

The Healing Journey (Healing from within) Alastair Cunningham OCI/PMH/UHN OCI/PMH/UHN Healing:

Self-healing systems seminar meeting 6 Tiina Niklander Presentation order Kemppainen:

Resurgence: Healing by Loving Blackness BY JAMILA DANIEL NOVEMBER 30, 2017 Resurgence: Healing

Optimal Healing Environments A Key Component of Your Personal Resiliency Plan Personal Healing

Coordinating Self-Healing & Self-Optimizing Disciplines in Autonomic Elements: An Experiment

Self-healing systems What are they? Tiina Niklander Seminar introduction, 2007 Earlier

Manipulating Managed Execution Manipulating Managed Execution Runtimes to support Self-Healing

Fracture and Fatigue of a Self-Healing Polymer Composite Material Eric N. Brown Advised by Nancy

Elements of Future COP Elements of Future COP Elements of Future COP Elements of Future COP

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

Novel Therapeutic for Healing Gut Tissue Novel Therapeutic for Healing Gut Tissue D D E P r e s

The Healing Practice of The Healing Practice of Healthcare: The Transformative Healthcare: The

HEALING SOFA designed by atelier PRO architekten manufactured by Keijsers Interiors HEALING SOFA

Ice and Stride [ a ] Common User Complaints Common User Complaints Difficult to Ice Specific

Too hot, too cold, or just right? Strategies for creating a class climate that fosters growth

Uncertain Knowledge and Bayes Rule George Konidaris gdk@cs.brown.edu Fall 2019 Knowledge

IDENTITY AND AUTHENTICATION Chad Spensky Allthenticate WHO AM I? WHO AM I? Chad Spensky

How robots can increase safety in your warehouse Speakers Thomas Goldsby John Santagate

Learning Objec3ves 1. To recognize the key clinical features of

SLEEP SUCCESS! Using ACHA-NCHA II data to identify need, build a campaign, and create measurable

ring worm ring worm herpes ring worm herpes staphylococcal ring worm herpes impetigo

Sambuz

Useful Links

Newsletter

Mail Us

Elements of the Self-Healing System Problem Space Phil Koopman - PowerPoint PPT Presentation

Elements of the Self-Healing System Problem Space Phil Koopman Carnegie Mellon University WADS, May 2003 & Electrical Computer ENGINEERING Overview Self-Healing its getting attention, but what does it mean?

Title: Healing Class 103 Week 1 Healing 103 Week 1 Jesus Healing Individuals Part 5

Self Healing in Streaming Systems #UW Database Day Dec 2nd, 2016 Karthik Ramasamy

The Healing Journey (Healing from within) Alastair Cunningham OCI/PMH/UHN OCI/PMH/UHN Healing:

Self-healing systems seminar meeting 6 Tiina Niklander Presentation order Kemppainen:

Resurgence: Healing by Loving Blackness BY JAMILA DANIEL NOVEMBER 30, 2017 Resurgence: Healing

Optimal Healing Environments A Key Component of Your Personal Resiliency Plan Personal Healing

Coordinating Self-Healing &amp; Self-Optimizing Disciplines in Autonomic Elements: An Experiment

Self-healing systems What are they? Tiina Niklander Seminar introduction, 2007 Earlier

Manipulating Managed Execution Manipulating Managed Execution Runtimes to support Self-Healing

Fracture and Fatigue of a Self-Healing Polymer Composite Material Eric N. Brown Advised by Nancy

Elements of Future COP Elements of Future COP Elements of Future COP Elements of Future COP

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

Novel Therapeutic for Healing Gut Tissue Novel Therapeutic for Healing Gut Tissue D D E P r e s

The Healing Practice of The Healing Practice of Healthcare: The Transformative Healthcare: The

HEALING SOFA designed by atelier PRO architekten manufactured by Keijsers Interiors HEALING SOFA

Ice and Stride [ a ] Common User Complaints Common User Complaints Difficult to Ice Specific

Too hot, too cold, or just right? Strategies for creating a class climate that fosters growth

Uncertain Knowledge and Bayes Rule George Konidaris gdk@cs.brown.edu Fall 2019 Knowledge

IDENTITY AND AUTHENTICATION Chad Spensky Allthenticate WHO AM I? WHO AM I? Chad Spensky

How robots can increase safety in your warehouse Speakers Thomas Goldsby John Santagate

Learning Objec3ves 1. To recognize the key clinical features of

SLEEP SUCCESS! Using ACHA-NCHA II data to identify need, build a campaign, and create measurable

ring worm ring worm herpes ring worm herpes staphylococcal ring worm herpes impetigo

Sambuz

Useful Links

Newsletter

Mail Us

Coordinating Self-Healing & Self-Optimizing Disciplines in Autonomic Elements: An Experiment