Self-Stabilization in Distributed Systems Course: Distributed Computing Faculty: Dr. Rajendra Prasath Spring 2019
About this topic This course covers various concepts in Self- Stabilization in Distributed Systems. We will also focus on the essential aspects of self-stabilization in distributed contexts 2 Rajendra, IIIT Sri City
RECAP What did you learn so far? What did you learn so far? è Challenges in Message Passing systems è Distributed Sorting è Space-Time Diagram è Partial Ordering / Causal Ordering è Concurrent Events è Local Clocks and Vector Clocks è Distributed Snapshots è Termination Detection è Topology Abstraction and Overlays è Leader Election Problem in Rings è Message Ordering / Group Communications è Distributed Mutual Exclusion Algorithms 3 Rajendra, IIIT Sri City
Topics to focus on opics to focus on … For End Semester è Distributed Mutual Exclusion è Deadlock Detection è Check Pointing and Rollback Recovery è Self-Stabilization è Distributed Consensus è Peer – to – peer computing and Overlays è Authentication in Distributed Systems 4 Rajendra, IIIT Sri City
Self-Stabilization in Distributed Systems Let us explore Self-Stabilization algorithms in Distributed Systems 5 Rajendra, IIIT Sri City
Handling F Handling Failur ailures / Recovery? es / Recovery? Failure of a site/node in a distributed system causes è inconsistencies in the state of the system. Recovery: bringing back the failed node in step with other è nodes in the system. Failures: è Process failure: è Deadlocks, protection violation, erroneous user è input, etc. System failure: è Failure of processor/system. System failure can have è full/partial amnesia. It can be a pause failure (system restarts at the same è state it was in before the crash) or a complete halt. Secondary storage failure: data inaccessible. è Communication failure: network inaccessible. è 6 Rajendra, IIIT Sri City
Consistent Checkpoints Consistent Checkpoints x1 x3 x2 X m y2 y1 Y z2 Z z1 è Overcoming domino effect and livelocks: checkpoints should not have messages in transit. è Consistent checkpoints: no message exchange between any pair of processes in the set as well as outside the set during the interval spanned by checkpoints. è {x1,y1,z1} is a strongly consistent checkpoint 7 Rajendra, IIIT Sri City
Types of ypes of CP CP-RR -RR Algorithms Algorithms è Synchronous Algorithm è Two Phase algorithm proposed by Koo and Toueg è Asynchronous Algorithm è A simple algorithm proposed by Juang & Venkatesan 8 Rajendra, IIIT Sri City
Overview Overview è Self-Stabilizing (SS) Systems è Legitimate / Illegitimate states è System Model è Token Ring System Dijkstra's Self-stabilizing Algorithm è Construct Breadth-First Trees (BFT) è è Computational Cost è Fault Tolerance / Factors Preventing SS è Limitations of SS systems 9 Rajendra, IIIT Sri City
Intr Introduction oduction è Legitimate State – Systems behave correctly as it has expected to. è Illegitimate State – inactive state or state in which the system misbehaves (Message is lost) è Self – Stabilization – A concept of fault-tolerance in distributed computing è Regardless of initial state, system is guaranteed to converge to a legitimate state in a finite amount of time without any outside intervention è Problem – Nodes do not have a global memory 10 Rajendra, IIIT Sri City
Definition Definition A system is self-stabilizing if and only if: è Convergence: Starting from any state, it is guaranteed that the system will eventually reach a correct state è Closure: Given that the system is in a correct state, it is guaranteed to stay in a correct state, provided that no fault happens è A system is said to be randomized self-stabilizing if and only if it is self-stabilizing and the expected number of rounds needed to reach a correct state is bounded by some constant k 11 Rajendra, IIIT Sri City
System Model System Model è An abstract computer model: state machine. è A distributed system model comprises of a set of n state machines called processors that communicate with each other, which can be represented as a GRAPH è Message passing communication model: queue(s) Q ij , for messages from P i to P j è è System configuration is set of states, and message queues. è In any case it is assumed that the topology remains connected, i.e., there exists a path between any two nodes. 12 Rajendra, IIIT Sri City
Tok oken Rings en Rings Dijkstra's Self-Stabilizing Token Ring System è When a machine has a privilege, it is able to change its è current state, which is referred to as a move. A legitimate state must satisfy the following constraints: è There must be at least one privilege in the system è (liveness or no deadlock). Every move from a legal state must again put the system è into a legal state (closure). During an infinite execution, each machine should enjoy è a privilege an infinite number of times (no starvation) Given any two legal states, there is a series of moves that è change one legal state to the other (reachability). Dijkstra considered a legitimate (or legal) state as one in which exactly one machine enjoys the privilege 13 Rajendra, IIIT Sri City
Dijkstr Dijkstra's a's Algorithm Algorithm è For any machine: è S – State of its own è L – State of the left neighbor and è R - State of the right neighbor on the ring è The exceptional machine: è If L = S then S = (S+1) mod K; è All other machines: è If L = S then S = L; 14 Rajendra, IIIT Sri City
Dijkstr Dijkstra's a's Algorithm Algorithm A Privilege of a machine is able to change its current state on è a Boolean predicate that consists of its current state and the states of its neighbors When a machine has a privilege, it is able to change its è current state, which is referred to as a move. Second solution (K = 3) The bottom machine, machine 0: è If (S+1) mod 3 = R then S = (S − 1) mod 3; è The top machine, machine n − 1: è If L = R and (L+1) mod 3 = S then S = (L+1) mod 3; è The other machines: è If (S+1) mod 3 = L then S = L; è 15 Rajendra, IIIT Sri City
An Illustr An Illustration ation è 4 Machines: M0, M1, M2, and M3 16 Rajendra, IIIT Sri City
Fault ault Toler olerance ance A Self-Stabilizing System handles Transient faults: è Inconsistent Initialization: Different processes initialized to local states that are inconsistent with one another. è Mode of Change: There can be different modes of execution of a system. In changing the mode of operation, it is impossible for all processes to effect the change in same time. è Transmission Errors: Loss, corruption, or reordering of messages è Memory Crash 17 Rajendra, IIIT Sri City
Factors P actors Preventing Self-Stabilization eventing Self-Stabilization Transient faults: Symmetry: Processes should not be identical/symmetric è because solution generally relies on a distinguished process. Termination: If any unsafe global state is a final state, è system will not be able to stabilize Isolation: Inadequate communication among processes è can lead to local states consistent, however, the resulting global state is not safe! Look-alike configurations: Such configurations result è when the same computation is enabled in two different states with no way to differentiate between them. Then system cannot guarantee convergence from unsafe state 18 Rajendra, IIIT Sri City
Limitations of Limitations of Self-Stabilizing Self-Stabilizing è Need for an exceptional machine è Convergence-response tradeoffs è Convergence span denotes the maximum number of critical transitions made before the system reaches a legal state è Response span denotes the maximum number of transitions to get from the starting state to some goal state è Critical Transitions. (ex. A process moves into a critical section, while another is already in!) 19 Rajendra, IIIT Sri City
Limitations of Limitations of Self-Stabilizing ( Self-Stabilizing (contd contd) ) è Pseudo-stabilization: Weaker, but less expensive with respect to self-stabilization. Every computation only needs to have some state è such that the suffix of the computation beginning at this state is in the set of legal computations. è Verification of self-stabilizing system Verification may be difficult. è Stair method developed; Proving the algorithm è stabilizes in each step verifies correctness of the entire algorithm, where interleaving assumptions are relaxed 20 Rajendra, IIIT Sri City
Costs of Costs of Self-Stabilization Self-Stabilization è Assessment of cost factor è Convergence Span: The maximum number of transitions that can be executed in a system, starting from an arbitrary state, before it reaches a safe state. è Response Span: The maximum number of transitions that can be executed in a system to reach a specified target state, starting from some initial state. The choice of initial state and target state depends upon the application 21 Rajendra, IIIT Sri City
Recommend
More recommend