dependability engineering of complex computing systems
play

Dependability Engineering of Complex Computing Systems M. Kaniche - PowerPoint PPT Presentation

Dependability Engineering of Complex Computing Systems M. Kaniche J.-C. Laprie J.-P. Blanquart LAAS / LIS LAAS / LIS Astrium / LIS kaaniche@laas.fr laprie@laas.fr blanquar@laas.fr 6th International Conference on Engineering


  1. Dependability Engineering of Complex Computing Systems M. Kaâniche J.-C. Laprie J.-P. Blanquart LAAS / LIS LAAS / LIS Astrium / LIS kaaniche@laas.fr laprie@laas.fr blanquar@laas.fr 6th International Conference on Engineering of Complex Computer Systems (ICECCS 2000) September 11-14 2000, Tokyo, Japan

  2. Availability Reliability Safety Confidentiality Attributes Integrity Maintainability Dependability Fault Prevention Property of a system Fault Tolerance such that reliance can Means justifiably be placed on Fault Removal the service it delivers Fault Forecasting (IFIP WG 10.4- Dependable Computing and Fault Tolerance) Fault Impairments Error Failure

  3. Motivation  Developing dependable systems able to deliver critical services with a justified level of confidence is not easy  increasing complexity, fault diversity, conflicting objectives, …  Traditional development models do not explicitly incorporate all activities needed for the production of dependable systems  Hardware (BSI 5760 Standard)  incorporation of assessments  fault tolerance activities focussed on physical faults only  Software (Waterfall, V model, spiral, incremental, process oriented,…)  structuring of activities  focus on verification  System engineering (EIA 632, IEEE 1220, …)  generic pluridisciplinary framework integrating products, processes and people  dependability related issues are not detailed  Need for a dependability-explicit development model

  4. Basic Model Dependability processes

  5. Basic activities System Creation Process System Creation Process • Requirements • Design • Realization • Integration Fault Prevention Process Fault Tolerance Process Fault Prevention Process Fault Tolerance Process • Formalisms & Languages • System behavior in presence of faults • Project organization • System partitioning • Project planning & risk assessment • Error & fault handling mechanisms Fault Removal Process Fault Forecasting Process Fault Removal Process Fault Forecasting Process • Verification • Dependability objectives • Diagnosis • Allocation • Modification • Evaluation

  6. Interactions Interactions System Creation System Creation Fault Prevention Fault Tolerance Fault Prevention Fault Tolerance Beh For Par Beh Behavior in the Org For Formalisms & Han presence of faults Pla languages Par System Partitioning Org Project Requirement Han Error & Fault Organization Handling Pla Project Planning Design & Risk assessment Realization Integration Mod Eva Dia All Obj Ver Fault Removal Fault Forecasting Fault Removal Fault Forecasting Obj Objectives Ver Verification All Allocation Dia Diagnosis Eva Evaluation Mod Modification

  7. Interactions: examples  Fault prevention process activities should be tightly coupled with system creation and dependability processes activities  Fault tolerance and fault forecasting  Definition of dependability related requirements and functions  Allocation of dependability requirements  Assessment of the efficiency of fault tolerance mechanisms (coverage)  Fault removal and fault tolerance  Verification of fault assumptions for traceability, consistency, completeness and verifiability  Verification of fault tolerance mechanisms by means of fault injection, formal verification or static analyzes  Fault removal and fault forecasting  Validation of fault forecasting assumptions and results  Definition of test stopping criteria based on dependability level achieved  Evaluation of dependability based on test results

  8. Fault Assumptions  Fault assumptions should be defined at each system refinement step  Support for the definition of fault tolerance strategies and mechanisms  Check for traceability, consistency, completeness and verifiability Fault Tolerance Coverage Error and Fault Fault Assumption Handling Coverage Coverage Failure Mode Failure Independence Coverage Coverage

  9. A meta-model not a life-cycle model System development Rq System requirements process De allocated to software traditional Software reuse reuse Prototyping Waterfall with without development adjustments changes process Rq Rq Rq Rq De Re De In Re De In In Rq In De Re Rq Requirements FP Fault Prevention In In FP Rq FT De De Design FT Fault Tolerance Re Re Realization FR Fault Removal Software FR FF In In In Integration FF Fault Forecasting Product

  10. Checklist ❍ Functional specification Requirements - functions (value, time) - mission phases & sequencing - operation/ maintenance ❍ Formalisms & languages Fault Prevention modes - standards, rules, tools, formalisms ❍ Environment description ❍ Project organization Fault Tolerance - boundaries and interactions - life cycle model - resource management ❍ Development and validation, Fault Removal constraints ❍ Project planning & risk assess. - foreseeable evolutions - risks identification & mitigation - interoperability, portability - dev. stages, transition criteria Fault Forecasting - reusablity, testability, … - planning of project reviews, certification, config. management ❍ Dependability objectives ❍ Verification planning - static analyzes and testing ❍ Failure modes analysis strategies (criteria, input generation) ❍ System behavior / failures - classification by severity - test-beds, environment simulators - dependability properties ❍ FF assumptions - criticality / mission phase ❍ Verification assumptions - acceptable degraded modes - classes of functions/ behavior ❍ Function-by-function - maximum tolerable duration - predicates dependability allocation of service interruption - number of simultaneous/ - classification of functions ❍ Requirements verification consecutive failures to be by criticality levels - traceability analysis tolerated for each mode - functional / behavioral analyses - reviews & inspections - fault tolerance means provided ❍ Fault forecasting planning by the environment ❍ Functional/ behavioral ❍ Data collection and analysis verification scenarios

  11. Checklist ❍ Architecture Design - structure - behavior ❍ Formalisms & languages - data Fault Prevention ❍ Low level requirements ❍ Project organization Fault Tolerance ❍ Project planning & riskassess. ❍ Reusable components? ❍ Operation and maintenance Fault Removal procedures definition ❍ System integration strategy ❍ System behavior / faults Fault Forecasting - fault assumptions ❍ System partitioning ❍ Verification assumptions - fault/error containment regions ❍ Design verification - FT application layers - behavioral analysis, reviews, ❍ FF assumptions inspections, prototyping ❍ Fault tolerance strategies - redundancy, design ❍ Failure Mode Analysis ❍ Fault tolerance verification diversity, exception handling - (Formal) Verification ❍ Allocation / component ❍ Error & Fault handling - Simulation- based fault injection mechanisms ❍ Preliminary dependability - error detection, diagnosis, assessment ❍ Unit / Integration testing recovery planning - fault diagnosis, passivation, ❍ Data Collection & Analysis reconfiguration ❍ Functional/structural ❍ Single points of failure? verification scenarios ❍ Verification of FF results

  12. Conclusion  Structuring and controlling the development process is a prerequisite for the successful integration of fault tolerance and dependability- related mechanisms in complex systems  The proposed model provides a generic framework for structuring fault prevention, fault tolerance, fault removal and fault forecasting activities  iterative process  tradeoffs  The guidelines aim to ensure that dependability related issues are not overlooked, but rather considered at each stage of the development  The proposed framework can be used to define and structure the evidence needed to support certification

Recommend


More recommend