Global Predicate Detection and Event Ordering
Our Problem To compute predicates over the state of a distributed application
Model Message passing No failures Two possible timing assumptions: 1. Synchronous System 2. Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock
Asynchronous systems Weakest possible assumptions cfr. “finite progress axiom” Weak assumptions less vulnerabilities ≡ Asynchronous ≠ slow “Interesting” model w.r.t. failures (ah ah ah!)
Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response s c
Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by The server computes the sending the server a message. response (possibly asking other The client blocks while waiting servers) and returns it to the for a response client #!?%! s c
Deadlock! p 1 p 2 p 3
Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds
Wait-For Graphs Draw arrow from to if has received a p i p j p j request but has not responded yet
Wait-For Graphs Draw arrow from to if has received a p i p j p j request but has not responded yet Cycle in WFG deadlock ⇒ · Deadlock cycle in WFG ⇒ ♦
The protocol sends a message to p 0 p 1 . . . p 3 On receipt of ’ s message, replies with its p 0 p i state and wait-for info
An execution p 1 p 1 p 2 p 3 p 2 p 3
An execution p 1 p 1 p 2 p 3 p 2 p 3
An execution p 1 p 1 p 2 p 3 p 2 p 3 Ghost Deadlock!
Houston, we have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events Mmmmhhh...
Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive is the -th event of process e i i p p The local history of process is the sequence h p p of events executed by process p : prefix that contains first k events h k p : initial, empty sequence h 0 p The history H is the set h p 0 ∪ h p 1 ∪ . . . h p n − 1 N OTE: In H, local histories are interpreted as sets, rather than sequences, of events
Ordering events Observation 1: Events in a local history are totally ordered p i time
Ordering events Observation 1: Events in a local history are totally ordered p i time Observation 2: For every message , precedes send ( m ) receive ( m ) m p i time m p j time
Happened-before (Lamport[1978]) A binary relation defined over events → 1. if and , then e k i , e l e k i → e l i ∈ h i k < l i 2. if and , e j = receive ( m ) e i = send ( m ) then e i → e j 3. if and then e � → e �� e → e �� e → e �
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →
Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →
Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: is a run h 1 , h 2 , . . . , h n A run is consistent if the total order imposed in the run is an extension of the partial order induced by → A single distributed computation may correspond to several consistent runs!
Cuts A cut C is a subset of the global history of H C = h c 1 1 ∪ h c 2 2 ∪ . . . h c n n p 1 p 2 p 3
Cuts A cut C is a subset of the global history of H C = h c 1 1 ∪ h c 2 2 ∪ . . . h c n n The frontier of C is the set of events e c 1 1 , e c 2 2 , . . . e c n n p 1 p 2 p 3
Global states and cuts The global state of a distributed computation is an -tuple of local states n Σ = ( σ 1 , . . . σ n ) To each cut corresponds a global ( c 1 . . . c n ) state ( σ c 1 1 , . . . σ c n n )
Consistent cuts and consistent global states A cut is consistent if ∀ e i , e j : e j ∈ C ∧ e i → e j ⇒ e i ∈ C A consistent global state is one corresponding to a consistent cut
What sees p 0 p 1 p 2 p 3
What sees p 0 p 1 p 2 p 3 Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by but not the corresponding p 3 send event
Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...
Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each timestamped with with T ( send ( m )) m
Snapshot I i. selects t ss p 0 ii. sends “take a snapshot at ” to all processes t ss p 0 iii. when clock of reads then t ss p i p a. records its local state σ i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to t ss
Snapshot I i. selects t ss p 0 ii. sends “take a snapshot at ” to all processes t ss p 0 iii. when clock of reads then t ss p i p a. records its local state σ i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to t ss
Correctness Theorem Snapshot I produces a consistent cut Proof Need to prove e j ∈ C ∧ e i → e j ⇒ e i ∈ C < Definition > < 0 and 1> < 5 and 3> 0 . e j ∈ C ≡ T ( e j ) < t ss 3 . T ( e j ) < t ss 6 . T ( e i ) < t ss < Assumption > < Property of real time> < Definition > 4 . e i → e j ⇒ T ( e i ) < T ( e j ) 1 . e j ∈ C 7 . e i ∈ C < Assumption > < 2 and 4> 5 . T ( e i ) < T ( e j ) 2 . e i → e j
Clock Condition < Property of real time> 4 . e i → e j ⇒ T ( e i ) < T ( e j ) Can the Clock Condition be implemented some other way?
Lamport Clocks Each process maintains a local variable LC value of for event LC ( e ) ≡ LC e e i e i +1 p p LC ( e i p ) < LC ( e i +1 ) p p e i p p LC ( e i p ) < LC ( e j q ) e j q q
Increment Rules e i e i +1 p p p LC ( e i +1 ) = LC ( e i p ) + 1 p e i p p e j q q LC ( e j q ) = max ( LC ( e j − 1 ) , LC ( e i p )) + 1 q Timestamp with TS ( m ) = LC ( send ( m )) m
Space-Time Diagrams and Logical Clocks 3 2 6 7 8 p 1 7 p 2 1 8 p 3 9 4 5 6
A subtle problem when do S LC = t doesn’t make sense for Lamport clocks! there is no guarantee that will ever be LC t S is anyway executed after LC = t Fixes: if is internal/send and LC = t − 2 e execute and then S e if e = receive ( m ) ∧ ( TS ( m ) ≥ t ) ∧ ( LC ≤ t − 1) put message back in channel re-enable ; set ; execute S LC = t − 1 e
An obvious problem No ! t ss Choose large enough that it cannot be Ω reached by applying the update rules of logical clocks
An obvious problem No ! t ss Choose large enough that it cannot be Ω reached by applying the update rules of logical clocks mmmmhhhh...
An obvious problem No ! t ss Choose large enough that it cannot be Ω reached by applying the update rules of logical clocks mmmmhhhh... Doing so assumes upper bound on message delivery time upper bound relative process speeds We better relax it...
Snapshot II processor selects Ω p 0 sends “take a snapshot at ” to all processes; it waits for Ω p 0 all of them to reply and then sets its logical clock to Ω when clock of reads then Ω p i p i records its local state σ i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to Ω
Recommend
More recommend