dynamic reasoning for safety assurance
play

Dynamic Reasoning for Safety Assurance Ibrahim Habli - PowerPoint PPT Presentation

Dynamic Reasoning for Safety Assurance Ibrahim Habli Ibrahim.habli@york.ac.uk Based on an ICSE NIER 2015 paper with Ewen Denney and Ganesh Pai https://ti.arc.nasa.gov/publications/21593/download Background Paradigm shift in many domains


  1. Dynamic Reasoning for Safety Assurance Ibrahim Habli Ibrahim.habli@york.ac.uk Based on an ICSE NIER 2015 paper with Ewen Denney and Ganesh Pai https://ti.arc.nasa.gov/publications/21593/download

  2. Background � Paradigm shift in many domains � Shift from a prescribed process to a product-oriented assurance � Shift from a tick-box to argument-based � Different drivers: � Accidents � Piper Alpha, 1988 � Different business model � Rail privatisation, 1992 � Incidents and recalls � FDA, 2010 � Complexity � Automotive, 2011 2

  3. Safety Case Contents Safety argument typically depends on: 1. Specification of a particular system design 2. Description of a particular configuration and environment in which the design will operate 3. An identified list of hazards associated with system operation 4. A claim that the list of hazards is sufficient 5. An assessment of the safety risk presented by each hazard, including estimates and assumptions used for quantification 6. Claims about the effectiveness of the chosen risk mitigation measures 7. A claim that for all mitigations which were not included, the mitigations were not reasonably practicable to implement [Rae 2009] All of the above can, and often, change 3

  4. Resilience or Safety 2.0 The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions. Erik Hollnagel

  5. Safety Case Depictions vs. Safety Case Reports Would the Real Safety Case Please Stand Up? Ibrahim Habli, Tim Kelly, 2007 G G 98 98 Software Software ( ( by itself by itself ) ) does not does not cause system level hazards cause system level hazards 98 98 S 4 C 5 G G 12 12 G 54 C 52 52 Argument over possible Software hazard Software has been Software has been Mechanical System limits contribution of software analysis developed using acceptable developed using acceptable authority of software to a Safe envelope Safe envelope to system level hazards processes processes safe envelope 12 12 54 52 52 G 65 C 9 All software safety Software safety properties have been properties G 34 G 56 demonstrated 65 Quality Management regime Testing regime of software of software suppliers is suppliers is adequate adequate J 1 St 8 34 56 Argument by appeal to Formal proof approach is proof of relevant independent of verification conditions development process Sn 3 Sn 4 J St QM G 23 Testing Reviews Reviews VCs over relevant proc - Sn specifications have been Sn discharged 23 Sn 65 Formal Proof Results Sn 65 This is not a safety case. Difference between the actual and the depicted The gap can lead to “ a culture of ‘paper safety’ at the expense of real safety ”. [Inquiry Report following the RAF Nimrod aircraft accident] 6

  6. Example 7

  7. QRH pages from Boeing B-757

  8. Same QRH pages WITH pilot annotations

  9. Why is this important particularly now? � Change in landscape of safety-critical applications � Increasing authority, autonomy, adaptation, and communication � Greater uncertainty about safe operation � including for historically stable domains such as aerospace and automotive 10

  10. The Myth of King Midas and his Golden Touch

  11. AI and Safety Requirements 1/2 � How do you specify cleanliness or making a cup of tea for a domestic robot? [Building safe artificial intelligence: specification, robustness, and assurance by DeepMind]

  12. AI and Safety Requirements 2/2 � Ideal requirements (“wishes”), corresponding to the hypothetical (but hard to articulate) description of an ideal AI system � System/software requirements (“blueprint”), corresponding to the requirements that we actually use to build the AI system, e.g. a reward function to maximise � Revealed requirements (“behaviour”), that best describes what actually happens, e.g. the reward function we can reverse-engineer from observing the system’s behaviour How do we reduce the gap between the above? [Building safe artificial intelligence: specification, robustness, and assurance by DeepMind]

  13. Autonomy and Intelligence Problem Domain Specification Specification Intent Intent Present Systems Future Systems

  14. Autonomy and Intelligence Solution Domain https://adeshpande3.github.io/Deep-Learning-Research-Review-Week-2-Reinforcement- Learning

  15. Contrasting Approaches to Safety [Vincent and Amalberti 2016]

  16. Dynamic Safety Cases 17

  17. Aim of Dynamic Safety Cases To continuously compute confidence in, and proactively update the reasoning about, the safety of ongoing operations 18

  18. Attributes of Dynamic Safety Cases � Continuity � safety is an operational concept � Proactivity � deal with leading indicators of, and precursors to, hazardous behaviour � i.e not just faults and failures � Computability � assessment of current confidence based on operational data � a high degree of automation and formality is necessary? � Updatability � Argument is partially developed (because system is evolving) � But well-formed with open tasks and continuous update � linked anticipation and preparedness 19

  19. Lifecycle Overview Regulations and Oversight Maintenance Safety Identify data management data Development Operations Dynamic Respond Monitor Safety Case Incident Operational Analyse reporting data Organizational Practices and Culture � Consideration of diverse factors � Development and Operations � Organizational practices and safety culture Plug the safety � Regulations case into system operations 20

  20. Identify Regulations and Oversight How can we decide on the Maintenance Safety Identify data most important management data subset of ADs? Development Operations Dynamic Respond Monitor Safety Case Incident Operational Analyse reporting data Organizational Practices and Culture � Sources of uncertainty in the safety case � i.e. assurance deficits (ADs) � Mapping ADs to assurance variables (AVars) � e.g., Environment and system variables � System/environment change � Argument change � AD Change 21

  21. Monitor Regulations and Oversight Maintenance Safety Identify data management data Development Operations Dynamic Respond Monitor Safety Case Incident Operational Analyse reporting data Organizational Practices and Culture � Data collection � Correspond to the underlying sources of uncertainty (AVars) � Operationalize assurance deficits � i.e. Specify in a measurable or assessable way � Relate to safety indicators � Leading / Lagging indicators 22

  22. Analyse Regulations and Oversight What can we learn from the Maintenanc Safety Identify e data world of AI and management data machine Development Operations learning? Dynamic Respond Monitor Safety Case Incident Operational Analyse reporting data Organizational Practices and Culture � Data analysis � Examine whether the AD thresholds are met � Define and update confidence in associated claims � Interconnected Claims � Necessity to aggregate confidence � E.g., Bayesian reasoning? 23

  23. Respond Regulations and Oversight Do we need a new theory for Maintenance Safety Identify data argument management data refactoring? Development Operations Rule mining? Dynamic Respond Monitor Safety Case Incident Operational Analyse reporting data Organizational Practices and Culture � Evolution � System / Environment change + DSC change, when necessary � Basis of update rules � Impact on confidence of new data � Response options already planned � Level of automation provided � Urgency of response and communication 24

  24. Dynamic Safety Case Elements � Want to operationalise through-life safety assurance � Explicit argument structure + metadata � Confidence structure � Assurance variables � Monitors ( AVar ∗ → EnumVal | ContinuousVal ) × Period � Update rules � Example: � Remove a branch of the argument depending on an invalidated assumption not ( tra ffi cDensity < n ) ⇒ forEach ( y :: solves ∗ Contextualizes | replaceWith ( y, empty )) � Create a task for an engineer to reconsider evidence when confidence in a particular branch drops below a threshold confidence ( NodeX ) < n ⇒ forEach ( E :: dependsOn ( E ); traceTo ( NodeX ) | ) | createTask ( engineer , inspect ( E ) , urgent )) 25

  25. Related Work � Formal foundation for safety cases � Work on automation and argument querying (Denney and Pai 2014) � Measurement of confidence in safety cases � Confidence arguments modelled in BBN (Denney, Pai and Habli 2011/2012) � Model-based assurance cases � Bringing the benefits of model-driven engineering, such as automation, transformation and validation (Hawkins, Habli and Kelly 2015) 26

  26. Related Literature In safety: � Safety Management Systems � Resilience engineering � High Reliability Organisations � Monitoring using safety cases � … In software engineering � Models@runtime � Runtime certification � Conditional certification � … . 27

  27. One further consideration What about unknown unknowns, i.e. total surprises? Almost all theories in safety indicate that accidents are rarely total surprises The information is out there but: 1. hard to find 2. complicated to analyse 3. given low priority 4. … 28

Recommend


More recommend