WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 1 C Layered Dependability Modeling of an Air Traffic Control System Olivia Das and C. Murray Woodside Department of Systems and Computer Engineering Carleton University, Ottawa, Canada odas@sce.carleton.ca, cmw@sce.carleton.ca
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 2 C Overview • dependability of complex systems • dependability for systems with layered software architecture • effect on coverage due to management subsystem failures • performability measures
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 3 C Layered Application Model Tasks, Interactions and Dependencies, and Processors Synchronous service request (up to four Controller controllers) Asynchronous service request user UI Interface Display modify display conflict display modify Management FlightPlan Display Flight Plan Alert Radar data (two Radars radars) Conflict modify get flight Flight Plan detect and resolve Resolution flightPlan Plan Management conflicts process Surveillance update read Flight Plan Trajectory radar modify get traje- Processing Flight Flight Database Management data trajectory ctory Plan Plan
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 4 C Replication Mechanisms P rimary-standby , load-balancing, active, primary-standby-active N UserB = 100 userB N UserA = 50 UserB userA UserA procB procA Log log eB eA AppB AppA Server serviceB serviceA proc1 #2 #1 proc2 #2 #1 eA1 eB1 Server1 eA2 eB2 Server2 proc3 proc4
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 5 C Example Configuration (1) proc3 fails and causes Server1 failure...Server2 used instead N UserB = 100 userB N UserA = 50 UserB userA UserA procB procA Log log eB eA AppB AppA Server serviceB serviceA proc1 #2 #1 proc2 #2 #1 eA1 eB1 Server1 eA2 eB2 Server2 proc3 proc4
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 6 C Example Configuration (2) proc1 fails and puts AppA out.. Group UserA fails.. Here, failure cannot be compensated by standby servers N UserB = 100 userB N UserA = 50 UserB userA UserA procB procA Log log eB eA AppB AppA Server serviceB serviceA proc1 #2 #1 proc2 #2 #1 eA1 eB1 Server1 eA2 eB2 Server2 proc3 proc4
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 7 C Centralized Fault Management Model AT AT AT UserA AppA Server1 Components - Application Tasks - Mgmt. cmpts. MT Manager Connectors - Alive-Watch - Notify AT AT AT Server2 UserB AppB
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 8 C Perfect detection and reconfiguration proc3 fails and causes Server1 failure... Full coverage : Server2 used instead N UserB = 100 userB N UserA = 50 UserB userA UserA procB procA Log log eB eA AppB AppA Server serviceB serviceA proc1 #2 #1 proc2 #2 #1 eA1 eB1 Server1 eA2 eB2 Server2 proc3 proc4
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 9 C Partial coverage for centralized mgmt. proc3 fails and causes Server1 failure... Partial coverage: Manager failed, so system failed N UserB = 100 userB N UserA = 50 UserB userA UserA procB procA Log log eB eA AppB AppA Server serviceB serviceA proc1 #2 #1 proc2 #2 #1 eA1 eB1 Server1 eA2 eB2 Server2 proc3 proc4
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 10 C Analysis - currently Compute Determine Probability, Distinct Prob ( C i ), of each Operational Operational Configurations Configuration C i Level 1 Compute Mean Reward= ∑ ( ) Prob Ci ⋅ ( ) R Ci Compute Reward, R ( C i ) of each i Operational Configuration using Layered Queueing Models Level 2
☎ ✎ ✙ ✆ ✜ ✙ ✞ ☛ ✛ ☎ ✖ ✄ ☎ ✆ ✍ ✠ ✁ ✤ ☞ ✛ ✢ ☛ ☛ ✟ ✤ ☞ ✟ ☛ ✆ ✁ ✜ ☎ ✁ ✠ ✠ ✏ ✛ ✙ ✞ ✙ ✣ ✟ ✆ ✙ ✠ ✌ ☞ ✟ ☛ ✆ ✡✁ ✁ ✟ ✁ ✥ ✆ ☎ ✄ ✤ ✟ ✛ ✙ ✞ ✠ ✡✁ ✡ ✁ ✠ ✟ ✞ ✝ ✙ ✡ ✛ ✟✘ ✍ ☞ ✗ ✖ ✔✕ ✓ ✑✒ ✏ ✎ ✛ WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 11 C Probabilities of Operational Configurations �✂✁ ✆✧✦ #2 #1 #1 #2 ✝✂✞ �✂✁ ☎✚✙ ✌✂✞ Non-coherent fault tree
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 12 C Layered Model of ATC En Route System (three Controller controllers) Console subsystem (three load-balanced user replicas) UI Interface AND Display modify display conflict display Management modify FlightPlan Alert Radar data Display Flight Plan (two Central Radars radars) subsystem (two primary-standby OR Radar replicas) OR subsystem (three primary- modify get flight Conflict Flight Plan detect and resolve standby-active flightPlan Plan Resolution Management conflicts replicas) OR process update read Flight Plan radar Surveillance Trajectory modify get traje- Flight Flight Database data Management Processing trajectory ctory Plan Plan
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 13 C Fault Mgmt. Model of ATC En Route System AT AT Console AT FlightPlan FlightPlan Processor AT Central Display Database Mgmt UI AT Processor Mgmt AT AT Conflict Resolution P2PSM P2PSM MT AT MT Trajectory gSAM gSAM Mgmt Radar Processor AT AT Name Surveillance Server Processing P2PSM Processor MT Name Monitor Server MT and Control gSAM subsystem (Three active replicas)
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 14 C Results Number of components (tasks and processors): 51 Number of connectors in fault management model: 118 Failure probability of all processors: 0.05 Failure probability of all tasks (including management tasks): 0.1 Total number of nodes in the graph that combines information from both the fault propagation graph and the Knowledge Propagation graph: 715 Number of operational configurations: 14 Time to generate and compute probabilities of configurations: 277 secs Probability of system being in working state: 0.33 Average throughput for Controller task: 0.067 requests/sec If failure probability of management tasks decreased to 0.05, then Probability of system being in working state: 0.45 and average throughput for Controller task increases to 0.093 requests/sec.
WADS Workshop at ICSE 2003 Olivia Das, Murray Woodside, May 3, 2003 15 C Conclusions • Dependability evaluation for layered software architectures • Scalable technique • separation of performance analysis from failure-repair • much smaller set of configurations because of layered architecture than of failure states • Operational configurations takes into account: • layered dependencies • "Knowledge failure" effects that depends on the status of the Management system which limits the reconfiguration capability • Explosion of configuration is a limitation
Recommend
More recommend