mixed criticality a personal view
play

Mixed Criticality A Personal View Alan Burns Contents Some - PowerPoint PPT Presentation

Mixed Criticality A Personal View Alan Burns Contents Some discussion on the notion of mixed criticality A brief overview of the literature An augmented system model Open issues (as I see it) What is Criticality A


  1. Mixed Criticality A Personal View Alan Burns

  2. Contents  Some discussion on the notion of mixed criticality  A brief overview of the literature  An augmented system model  Open issues (as I see it)

  3. What is Criticality  A classification primarily based on the consequences and likelihood of failure  Wrong/late/missing output  HAZOPS  Dictates the procedures required in the design, verification and implementation of the code  Dictates the level of hardware redundancy  Has enormous cost implications

  4. What Mixed Criticality is  A means of dealing with the inherent uncertainty in a complex system  A means of providing efficient resource usage in the context of this uncertainty  A means of protecting the more critical work when faults occur  Including where assumptions are violated (rely conditions are false)  Note some tasks are fail-stop/safe, others are fail- operational – regardless of criticality

  5. What Mixed Criticality isn’t  Not a mixture of hard and soft deadlines  Not a mixture of critical and non-critical  Not (only) delivering isolation and non- interference  Not dropping tasks to make a system schedulable

  6. WCET – a source of uncertainty  We know that WCET cannot be known with certainty  All estimates have a probability of being wrong (too low)  But all estimates are attempting to be safe (pessimistic)  In particular C(LO) is a valid engineered estimate with the belief that C(LO) > WCET

  7. Events  An event driven system must make assumptions about the intensity of the events  Again this cannot be known with certainty  So Load parameters need to be estimated (safely)  In particular T(LO) < T(real)

  8. Fault Tolerance  Critical systems need to demonstrate survivability  Faults will occur – and some level must be tolerated  Faults are not independent  Faults might relate to the assumptions upon which the verification of the timing behaviour of the system was based  E.g. WCET, arrival rates, battery power

  9. Fault Models  Fault models gives a means of assessing/ delivering survivability  Full functional behaviour with a certain level of faults  Graceful Degradation for more severe faults  Graceful Degradation is a controlled reduction in functionality, aiming to preserve safety  For example:  If any task executes for more than C(LO) and all HI- criticality tasks execute for no more than C(HI) then it can be demonstrated that all HI-criticality tasks meet their deadlines

  10. Graceful Degradation  As a strategy for Graceful Degradation a number of schemes in MCS literature have been proposed:  Drop all lower critical work  Drop some, using notions of importance etc.  Extend periods (elastic task model)  Reduce functionality within low and high crit work  The strategy should extend to issues concerning C(HI) bound also being wrong

  11. Graceful Degradation  If tasks are dropped/aborted then this cannot be arbitrary – the approach must be related back to the software architecture / task dependencies  Use of fault-trees perhaps  Recovery must also relate to the needs of the software (e.g. dealing with missing/ stale state)  Normal behaviour should be called that, normal, not LO-criticality mode

  12. Fault Recovery  After a fault, and degraded functionality it should be possible for the system to return to full functionality  A 747 can fly with 3 engines, but its nice to get the 4 th one back!  This can be within the system model  Or outside (cold/warm restart)  Typical with hardware redundancy

  13. Existing Literature  Since Vestal’s paper there has been at least 180 articles published (one every 2 weeks!)  I hope you are all familiar with the review from York (updated every 6 months and funded by the MCC project)  www-user.cs.york.ac.uk/~burns/  Some top level observations follow

  14. Observations  For uniprocessors:  For FPS, AMC seems to be the ‘standard’ approach  For EDF, schemes that have a virtual deadline for the HI-crit tasks seem to be standard  Server based schemes have been revisited  Not too much work on the scheduling schemes actually used in safety-critical systems, e.g. cyclic executives and non-preemptive (or cooperative) FPS

  15. Observations  For multiprocessor systems there are a number of schemes (extensions from uni- criticality systems)  Similarly for resource sharing protocols  Work on communications is less well represented  Lots of work on graceful degradation  On allocation – ‘to separate or integrate, that is the question’

  16. Observations  Almost all papers stick to just two criticality levels  But LO-crit does not mean no-crit  Some pay lip service to multiple levels  What is the model we require for, say, 4 or 5 levels?  It does not seem to make sense to have five estimates of WCET

  17. Observations  Little on linking to fault tolerance in general  Little work on probabilistic assessment of uncertainty  Some implementation work, but not enough  Some comparative evaluations, but not enough  Good coverage of formal issues such as speed-up factors

  18. Augmented Model  Four criticality levels (a,b,c,d) plus a non- critical level (e)  How many estimates of WCET?  I feel a sufficiently expressive model can be obtained by only having two levels, C(normal) and C(self)  So tasks of crit d just have C(normal)  Task of crit c have C(self) and C(normal)  Tasks of crit b have C(self), C(normal), C(normal)

  19. Augmented Model  All guarantees are met with C(normal)s  No tasks can execute for more than its C(self)  Run-time monitoring required  Mode change giving more time is possible  If a task of crit b, say, exceeds its C(normal) then it must remain schedulable if it uses up to C(self), crit a tasks use C(normal) and no other tasks need to be guaranteed

  20. Open Issues 1. As well as looking at mixing criticality levels within a single scheduling scheme (e.g. different priorities within FPS) we need to look at integrating different schemes (e.g. Cyclic Executives for safety-critical, FPS for mission critical – on same processor) 2. More work is needed to integrate the run- time behaviour (monitoring and control) with the assumptions made during static verification

  21. Open Issues 3. We need to be more holistic in terms of ALL system resources (especially communications media) 4. There are a number of formal aspects of scheduling still to be investigated (we should not apologies for finding the research in this area fascinating)

  22. Open Issues 5. We need to be sure that techniques scale to at least 5 levels of criticality 6. There are still a number of open issues with regard to graceful degradation and fault recovery 7. There is little work as yet on security as an aspect of criticality 8. We need protocols for information sharing between criticality levels

  23. Open Issues 9. We need better WCET analysis to reduce the (safe) C(HI) and C(LO) values 10. We should look to have an impact on the Standards relevant to the application domains we hope to influence 11. Better models for system overheads and task dependencies 12. How many criticality levels to support?

  24. Open Issues 13. We do not as yet have the structures (models, methods, protocols, analysis etc) that allow tradeoffs between sharing and separation to be evaluated

  25. Conclusion  We have lots to discuss

Recommend


More recommend