global predicate detection and event ordering our problem
play

Global Predicate Detection and Event Ordering Our Problem To - PowerPoint PPT Presentation

Global Predicate Detection and Event Ordering Our Problem To compute predicates over the state of a distributed application Model Message passing No failures Two possible timing assumptions: 1. Synchronous System 2. Asynchronous System


  1. Global Predicate Detection and Event Ordering

  2. Our Problem To compute predicates over the state of a distributed application

  3. Model Message passing No failures Two possible timing assumptions: 1. Synchronous System 2. Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock

  4. Asynchronous systems Weakest possible assumptions cfr. “finite progress axiom” Weak assumptions less vulnerabilities ≡ Asynchronous ≠ slow “Interesting” model w.r.t. failures (ah ah ah!)

  5. Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response s c

  6. Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by The server computes the sending the server a message. response (possibly asking other The client blocks while waiting servers) and returns it to the for a response client #!?%! s c

  7. Deadlock! p 1 p 2 p 3

  8. Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds

  9. Wait-For Graphs Draw arrow from to if has received a p i p j p j request but has not responded yet

  10. Wait-For Graphs Draw arrow from to if has received a p i p j p j request but has not responded yet Cycle in WFG deadlock ⇒ · Deadlock cycle in WFG ⇒ ♦

  11. The protocol sends a message to p 0 p 1 . . . p 3 On receipt of ’ s message, replies with its p 0 p i state and wait-for info

  12. An execution p 1 p 1 p 2 p 3 p 2 p 3

  13. An execution p 1 p 1 p 2 p 3 p 2 p 3

  14. An execution p 1 p 1 p 2 p 3 p 2 p 3 Ghost Deadlock!

  15. Houston, we have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events Mmmmhhh...

  16. Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive is the -th event of process e i i p p The local history of process is the sequence h p p of events executed by process p : prefix that contains first k events h k p : initial, empty sequence h 0 p The history H is the set h p 0 ∪ h p 1 ∪ . . . h p n − 1 N OTE: In H, local histories are interpreted as sets, rather than sequences, of events

  17. Ordering events Observation 1: Events in a local history are totally ordered p i time

  18. Ordering events Observation 1: Events in a local history are totally ordered p i time Observation 2: For every message , precedes send ( m ) receive ( m ) m p i time m p j time

  19. Happened-before (Lamport[1978]) A binary relation defined over events → 1. if and , then e k i , e l e k i → e l i ∈ h i k < l i 2. if and , e j = receive ( m ) e i = send ( m ) then e i → e j 3. if and then e � → e �� e → e �� e → e �

  20. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2

  21. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2

  22. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2

  23. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2

  24. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →

  25. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →

  26. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →

  27. Space-Time diagrams A graphic representation of a distributed execution time p 1 p 1 p 2 p 3 p 3 p 2 H and impose a partial order →

  28. Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: is a run h 1 , h 2 , . . . , h n A run is consistent if the total order imposed in the run is an extension of the partial order induced by → A single distributed computation may correspond to several consistent runs!

  29. Cuts A cut C is a subset of the global history of H C = h c 1 1 ∪ h c 2 2 ∪ . . . h c n n p 1 p 2 p 3

  30. Cuts A cut C is a subset of the global history of H C = h c 1 1 ∪ h c 2 2 ∪ . . . h c n n The frontier of C is the set of events e c 1 1 , e c 2 2 , . . . e c n n p 1 p 2 p 3

  31. Global states and cuts The global state of a distributed computation is an -tuple of local states n Σ = ( σ 1 , . . . σ n ) To each cut corresponds a global ( c 1 . . . c n ) state ( σ c 1 1 , . . . σ c n n )

  32. Consistent cuts and consistent global states A cut is consistent if ∀ e i , e j : e j ∈ C ∧ e i → e j ⇒ e i ∈ C A consistent global state is one corresponding to a consistent cut

  33. What sees p 0 p 1 p 2 p 3

  34. What sees p 0 p 1 p 2 p 3 Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by but not the corresponding p 3 send event

  35. Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...

  36. Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each timestamped with with T ( send ( m )) m

  37. Snapshot I i. selects t ss p 0 ii. sends “take a snapshot at ” to all processes t ss p 0 iii. when clock of reads then t ss p i p a. records its local state σ i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

  38. Snapshot I i. selects t ss p 0 ii. sends “take a snapshot at ” to all processes t ss p 0 iii. when clock of reads then t ss p i p a. records its local state σ i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

  39. Correctness Theorem Snapshot I produces a consistent cut Proof Need to prove e j ∈ C ∧ e i → e j ⇒ e i ∈ C < Definition > < 0 and 1> < 5 and 3> 0 . e j ∈ C ≡ T ( e j ) < t ss 3 . T ( e j ) < t ss 6 . T ( e i ) < t ss < Assumption > < Property of real time> < Definition > 4 . e i → e j ⇒ T ( e i ) < T ( e j ) 1 . e j ∈ C 7 . e i ∈ C < Assumption > < 2 and 4> 5 . T ( e i ) < T ( e j ) 2 . e i → e j

  40. Clock Condition < Property of real time> 4 . e i → e j ⇒ T ( e i ) < T ( e j ) Can the Clock Condition be implemented some other way?

  41. Lamport Clocks Each process maintains a local variable LC value of for event LC ( e ) ≡ LC e e i e i +1 p p LC ( e i p ) < LC ( e i +1 ) p p e i p p LC ( e i p ) < LC ( e j q ) e j q q

  42. Increment Rules e i e i +1 p p p LC ( e i +1 ) = LC ( e i p ) + 1 p e i p p e j q q LC ( e j q ) = max ( LC ( e j − 1 ) , LC ( e i p )) + 1 q Timestamp with TS ( m ) = LC ( send ( m )) m

  43. Space-Time Diagrams and Logical Clocks 3 2 6 7 8 p 1 7 p 2 1 8 p 3 9 4 5 6

  44. A subtle problem when do S LC = t doesn’t make sense for Lamport clocks! there is no guarantee that will ever be LC t S is anyway executed after LC = t Fixes: if is internal/send and LC = t − 2 e execute and then S e if e = receive ( m ) ∧ ( TS ( m ) ≥ t ) ∧ ( LC ≤ t − 1) put message back in channel re-enable ; set ; execute S LC = t − 1 e

  45. An obvious problem No ! t ss Choose large enough that it cannot be Ω reached by applying the update rules of logical clocks

  46. An obvious problem No ! t ss Choose large enough that it cannot be Ω reached by applying the update rules of logical clocks mmmmhhhh...

  47. An obvious problem No ! t ss Choose large enough that it cannot be Ω reached by applying the update rules of logical clocks mmmmhhhh... Doing so assumes upper bound on message delivery time upper bound relative process speeds We better relax it...

  48. Snapshot II processor selects Ω p 0 sends “take a snapshot at ” to all processes; it waits for Ω p 0 all of them to reply and then sets its logical clock to Ω when clock of reads then Ω p i p i records its local state σ i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to Ω

Recommend


More recommend