Recall our discussion of time CS514: Intermediate Course � Logical clocks: represent part of � in Operating Systems relation, small overhead � Vector clocks: accurately represent � but more costly Professor Ken Birman � Wall clocks: tradeoff between precision and accuracy. Vivek Vishnumurthy: TA � Rarely precise enough for use in protocols � Hence often view time as an “add on” Today: “Simultaneous” actions Temporal distortions � There are many situations in which we � Things can be complicated because we want to talk about some form of can’t predict simultaneous event � Message delays (they vary constantly) � Our missile interceptor is one case � Execution speeds (often a process shares a � But think about updating replicated data machine with many other tasks) � Perhaps we have multiple conflicting updates � Timing of external events � The need is to ensure that they will happen in the same order at all copies � Lamport looked at this question too � This “looks” like a kind of simultaneous action
Temporal distortions Temporal distortions � What does “now” mean? � What does “now” mean? p 0 p 0 a d a d b c e b c e p 1 p 1 f f p 2 p 2 p 3 p 3 Temporal distortions Temporal distortions � Timelines can “stretch”… � Timelines can “shrink” p 0 p 0 a d a d b c e b c e p 1 p 1 f f p 2 p 2 p 3 p 3 � … caused by scheduling effects, � E.g. something lets a machine speed up message delays, message loss…
Temporal distortions Consistent cuts and snapshots � Idea is to identify system states that � Cuts represent instants of time. “might” have occurred in real-life p 0 a d � Need to avoid capturing states in which a b c e p 1 message is received but nobody is shown as having sent it f p 2 � This the problem with the gray cuts p 3 � But not every “cut” makes sense � Black cuts could occur but not gray ones. Temporal distortions Temporal distortions � Red messages cross gray cuts “backwards” � Red messages cross gray cuts “backwards” p 0 p 0 a d a b c e b c e p 1 p 1 f p 2 p 2 p 3 p 3 � In a nutshell: the cut includes a message that “was never sent”
Who cares? Deadlock detection “algorithm” � Suppose, for example, that we want to � p worries: perhaps we have a deadlock do distributed deadlock detection � p is waiting for q, so sends “what’s your � System lets processes “wait” for actions by state?” other processes � q, on receipt, is waiting for r, so sends � A process can only do one thing at a time the same question… and r for s…. And s � A deadlock occurs if there is a circular wait is waiting on p. Suppose we detect this state Phantom deadlocks! � We see a cycle… � Suppose system has a very high rate of locking. p q Waiting for � Then perhaps a lock release message “passed” a query message Waiting for Waiting for � i.e. we see “q waiting for r” and “r waiting for s” but in fact, by the time we checked r, q was no longer waiting! r s Waiting for � In effect: we checked for deadlock on a gray cut – an inconsistent cut. � … but is it a deadlock?
Consistent cuts and snapshots Estudar � Goal is to draw a line across the system � Chandy, K. M., and L. Lamport, state such that “Distributed Snapshots: Determining States of Distributed Systems”, ACM � Every message “received” by a process is Transactions On Computer Systems:3:1 shown as having been sent by some other process (February 1985): 63-75 � Some pending messages might still be in � Ou Cap. 11 Coulouris (Seção 11.5.3) communication channels � A “cut” is the frontier of a “snapshot” Chandy/Lamport Algorithm Using logical clocks to make cuts � Assume that if p i can talk to p j they do so Message sets the time forward by a “lot” using a lossless, FIFO connection p 0 � Now think about logical clocks a d b c e � Suppose someone sets his clock way ahead and p 1 triggers a “flood” of messages f � As these reach each process, it advances its own p 2 time… eventually all do so. p 3 � The point where time jumps forward is a consistent cut across the system Algorithm requires FIFO channels: must delay e until b has been delivered!
Turn idea into an algorithm Using logical clocks to make cuts � To start a new snapshot, p i … “Cut” occurs at point where time advanced � Builds a message: “P i is initiating snapshot k”. � The tuple (p i , k) uniquely identifies the snapshot p 0 a d � In general, on first learning about snapshot (p i , k), p x b c e � Writes down its state: p x ’s contribution to the snapshot p 1 � Starts “tape recorders” for all communication channels � Forwards the message on all outgoing channels f p 2 � Stops “tape recorder” for a channel when a snapshot message for (p i , k) is received on it p 3 � Snapshot consists of all the local state contributions and all the tape-recordings for the channels Chandy/Lamport Chandy/Lamport � This algorithm, but implemented with w an outgoing flood, followed by an t q incoming wave of snapshot r p contributions s � Snapshot ends up accumulating at the u initiator, p i y v x � Algorithm doesn’t tolerate process z failures or message failures. A network
Chandy/Lamport Chandy/Lamport w w I want to start t t q q p records local state a snapshot r r p p s s u u y y v v x x z z A network A network Chandy/Lamport Chandy/Lamport w w p starts monitoring t t q q incoming channels “contents of channel p- r r y” p p s s u u y y v v x x z z A network A network
Chandy/Lamport Chandy/Lamport w w p floods message on t t q q outgoing channels… r r p p s s u u y y v v x x z z A network A network Chandy/Lamport Chandy/Lamport w w q is done t t q q q r r p p s s u u y y v v x x z z A network A network
Chandy/Lamport Chandy/Lamport w w t t q q q q r r p p s s u u y y v v x x z s z z A network A network Chandy/Lamport Chandy/Lamport w w w x x t t q q q q r r s z p p u s s y v u u r u y y v v x x s z z z v A network A network
Chandy/Lamport What’s in the “state”? � In practice we only record things important to w the application running the algorithm, not the t q p q “whole” state Done! r p r s � E.g. “locks currently held”, “lock release s messages” t u � Idea is that the snapshot will be u w v y y v � Easy to analyze, letting us build a picture of the x x system state z � And will have everything that matters for our real z purpose, like deadlock detection A snapshot of a network Other algorithms? � Many algorithms have a consistent cut mechanism hidden within � More broadly we’ll see that notions of time are sometimes explicit in algorithms � But are often used as the insight that motivated the developer � By thinking about time, he or she was able to reason about a protocol � We’ll often use this approach
Recommend
More recommend