adaptability and fault tolerance adaptability and fault
play

Adaptability and Fault Tolerance Adaptability and Fault Tolerance - PowerPoint PPT Presentation

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de Lemos Rog University of Kent, UK University of Kent, UK Context: self-* and dependability; Focus: adaptability and fault tolerance;


  1. Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rogé ério rio de Lemos de Lemos Rog University of Kent, UK University of Kent, UK � Context: self-* and dependability; � Focus: adaptability and fault tolerance; � State of the art; � Conclusions; ICSE 2006 SEAMS – May 2006 – 1 Rogério de Lemos

  2. Self- -* and Dependability * and Dependability Self � Dependability: � the ability to deliver service that can justifiably be trusted; � Self-* properties of systems: � the support for autonomy; � self-adaptable, self-managing, self-optimising, self-healing, self-repairing, self-configuring, etc. � Adaptability: � the ability of a system of accommodating changes while providing its specified services; run-time changes; � ICSE 2006 SEAMS – May 2006 – 2 Rogério de Lemos

  3. Dependability Dependability Dependability - the ability to avoid service failures that are Dependability more frequent and more severe than is acceptable; � threats threats - undesired, but in principle expected � circumstances: � faults, errors and failures; � attributes attributes – properties of the system: � � reliability, availability, integrity, confidentiality, and safety; � technologies technologies – methods and techniques for providing � and reach confidence on ability to attain dependability: � rigorous design, validation & verification, fault tolerance, and system evaluation; ICSE 2006 SEAMS – May 2006 – 3 Rogério de Lemos

  4. Dependability - - Threats Threats Dependability adjudged or ( Yves Yves Deswarte Deswarte & David Powell & David Powell ) ) ( hypothesized cause of an error Fault that part of system state which may lead to a failure activation occurs when delivered service deviates from Error Error implementing the system function propagation Failure Failure causation activation Fault Error Fault Error ICSE 2006 SEAMS – May 2006 – 4 Rogério de Lemos

  5. Adaptability - - Initiators Initiators Adaptability � Changes: � the act, process, or result of altering or modifying; � internal changes: component failures, overload of resources, etc. � � external changes: environmental, requirements, etc. � � There is no fundamental chain of adaptability initiators; ICSE 2006 SEAMS – May 2006 – 5 Rogério de Lemos

  6. Threats and Initiators Threats and Initiators � Changes correspond to events (faults): � changes can be dormant if not activated; � What is the consequence of change (errors)? � what would be the equivalent to error free and erroneous states? � these states are created when changes are activated and can remain latent until detected; � What is the equivalent of failure? � unsuccessful adaptation? � the system might continue to provide its services, but ignoring the change; ICSE 2006 SEAMS – May 2006 – 6 Rogério de Lemos

  7. Dependability - - Technologies Technologies Dependability Fault avoidance : build a system with no faults: Fault avoidance � rigorous design – fault prevention; � formal and rigorous notations, processes, adapters, etc. � verification & validation – fault removal; � model checking, fault injection, testing, simulation, etc. Fault acceptance : impossible to rid the system of faults: Fault acceptance � fault tolerance; � system evaluation – fault forecasting; � empirical approaches, Markov models, etc. ICSE 2006 SEAMS – May 2006 – 7 Rogério de Lemos

  8. Fault Tolerance Fault Tolerance Fault tolerance aims at avoiding the failure of the system: Fault tolerance � error detection : � detects the presence of errors; � recovery : � transforms a system state that contains errors or faults into a error free state, or faults that can be re-activated; error handling: � eliminates errors from the system state; � fault handling: � prevents faults from being activated again; � diagnosis, isolation and reconfiguration; � ICSE 2006 SEAMS – May 2006 – 8 Rogério de Lemos

  9. Fault Tolerance Fault Tolerance adjudged or hypothesized ( Yves Yves Deswarte Deswarte & David Powell & David Powell ) ) ( cause of an error Fault that part of system state which may lead to a failure Fault Handling Fault Handling Fault Handling occurs when delivered service deviates from Diagnosis, Isolation, Diagnosis, Isolation, Error Error Diagnosis, Isolation, implementing the Reconfiguration, Reconfiguration, Reconfiguration, system function Reinitialization Reinitialization Reinitialization Error Detection Error Detection Error Handling Error Handling Error Handling Rollback, Rollforward Rollforward, , Rollback, Rollback, Rollforward, Failure Failure Compensation Compensation Compensation ICSE 2006 SEAMS – May 2006 – 9 Rogério de Lemos

  10. System Structure System Structure Fault tolerance is about system structuring; � structure is what enables the system to generate the behaviour; � determines how effectively this structuring can be used to provide means of error confinement error confinement ; � avoid the propagation of errors; � what interactions can exist and at what rate; � it is not restricted to system architecture; Structural flexibility the basis for adaptation; ICSE 2006 SEAMS – May 2006 – 10 Rogério de Lemos

  11. Fault Assumptions Fault Assumptions Faults are undesirable, though expected circumstances: � systems can fail in many different ways; In the design of fault-tolerant systems, it is essential to define assumptions: � nature nature of faults - dictates the type of redundancy that � must be implemented: � space or time; � replication or diversification; � rate rate of faults - influences the amount of redundancy � needed to attain a given dependability; ICSE 2006 SEAMS – May 2006 – 11 Rogério de Lemos

  12. Fault Assumptions Fault Assumptions How a component behaves when it fails: � crash fault being the simplest and most restrictive (or well-defined) type; � Byzantine being the least restrictive; crash crash crash omission timing Byzantine omission timing Byzantine The different types of changes needs to be classified; � behavioural assumptions; ICSE 2006 SEAMS – May 2006 – 12 Rogério de Lemos

  13. State of the Art State of the Art Adaptive fault tolerance Adaptive fault tolerance � property that enables a system to maintain and improve fault tolerance by adapting to changes in environment and policy; � monitor the system; � reconfigure the application when its configuration of it is not appropriate for the dependability requirements; � distributed systems: � different layers: middleware / fault tolerance /adaptation; � � consensus problem; ICSE 2006 SEAMS – May 2006 – 13 Rogério de Lemos

  14. State of the Art State of the Art � AQuA – CORBA based operating system; � dynamic replication of objects; � Proteus: dynamic fault tolerance through adaptive reconfiguration; � allows to specify the degree of dependability at the application � level; ICSE 2006 SEAMS – May 2006 – 14 Rogério de Lemos

  15. State of the Art State of the Art � Chameleon - adaptive infrastructure; � allows different levels of availability requirements; � explicit representation of adaptive policies; � provides dependability through the use of ARMORs (Adaptive, Reconfigurable, and Mobile Objects for Reliability): managers for monitoring and recovering resources; � daemons for providing communication; � common ARMORs for providing application-required � dependability; � enables multiple fault tolerance strategies to co-exist; ICSE 2006 SEAMS – May 2006 – 15 Rogério de Lemos

  16. State of the Art State of the Art Architectural fault tolerance Architectural fault tolerance � Error detection and recovery; � techniques based on exception-handling; application dependent; � iC2C and iFTE; � � Fault handling � system reconfiguration; replacement of components, connectors and configurations; � � dynamic reconfiguration; ICSE 2006 SEAMS – May 2006 – 16 Rogério de Lemos

  17. State of the Art State of the Art Bio- -inspired computing inspired computing and statistical methods statistical methods : Bio � data-oriented approaches � data mining large quantities of observations for identifying patterns; � anomaly (fault and intrusion) detection; � neural networks, genetic algorithms, etc.; � adaptive error detection using artificial immune systems: problem: how to learn from rare events! � � statistical learning techniques (SLT) applied to system recovery; ICSE 2006 SEAMS – May 2006 – 17 Rogério de Lemos

  18. Conclusions Conclusions � Changes are like faults, though: � they might be desired/undesired and expected/unexpected; � Classification of the types of changes: � otherwise becomes application dependent; e.g., exception handling for the support of fault tolerance; � � How system structuring affects adaptability? � is software that flexible for supporting run-time change? impact of design-time change; � � to scope the impact of change; confinement of the consequence of change; � ICSE 2006 SEAMS – May 2006 – 18 Rogério de Lemos

Recommend


More recommend