TDDD82 Secure Mobile Systems Lecture 5: Dependability Mikael Asplund Real-tjme Systems Laboratory Department of Computer and Informatjon Science Linköping University Based on slides by Simin Nadjm-Tehrani
Dependability Property of a computing system which allows reliance to be justifiably placed on the service it delivers. [Avizienis et al.] The ability to avoid service failures that are more frequent or more severe than is acceptable.
Dependability taxonomy
Fault-tolerant Distributed Systems
Redundancy Necessary for fault-tolerance! ● Increase overall complexity ● Statjc ● – Error masking propertjes Dynamic ● – Error detectjng propertjes
Dependability & Distributjon • Making systems fault-tolerant typically uses redundancy – Redundancy in space leads to distributjon – But distributed systems are not necessarily fault- tolerant!
Replicatjon • Passive replicatjon • Actjve replicatjon – Primary – backup – Group membership – Cold/Warm/Hot
Implementjng replica consistency • Message ordering – Use the before relatjon (i.e., by using Lamport clocks) • Agreement – For passive replicatjon • Controlled by the master • Stjll requires agreement of when the primary is down... – Actjve replicatjon • Agreement for every operatjon
Agreement is not just for replicatjon
The consensus problem • Processes p 1 ,…, p n take part in a decision – Each p i proposes a value v i – All correct processes decide on a common value v that is equal to one of the proposed values • Desired propertjes – Terminatjon: Every correct process eventually decides – Agreement: No two (correct) processes decide difgerently – Validity: If a process decides v then the value v was proposed by some process
Fault model Non-tolerated faults Tolerated faults Normality
Recall from previous lecture Node/Channel failures ● Crash – Omission – Timing – Byzantine/arbitrary – System model ● Synchronous – Asynchronous –
Basic impossibility result [Fischer, Lynch and Paterson 1985] • There is no deterministjc algorithm solving the consensus problem in an asynchronous distributed system with a single crash failure.
Naïve approaches ● Wait for all to agree – Node crash ● Wait for a majority to agree – What about confmicts? ● When to move on?
Assume synchrony ● If a node does not respond within tjme t, it will not respond at tjme t+d ● Partjal synchrony – Bounds exist but are not known ● Powerful abstractjon: – Unreliable failure detectors
For the project ● Passive replicatjon ● Need to think carefully about your fault model!
Hints for your dependability analysis 1. Model the system – What is the logical structure? – What are your assumptjons? 2. Consider what types of faults that could occur (part of risk analysis) – Nodes (crash, byzantjne, ...) – Links 3. If possible, measure parameters such the frequency of faults, and tjme to recovery , combine with historical data – htups://doi.org/10.1109/TDSC.2009.4 – htup://liu.diva-portal.org/smash/record.jsf?pid=diva2%3A1034002 5. Derive esimates for dependability atuributes of your system
For the future ● Dependability is important! – Take it seriously when building systems ● Fault-tolerance is non-trivial ● Create simple and easy-to-understand systems (at least the cores)
Recommend
More recommend