Virtualized PhysicalClocks
What do we use Clocks for • When did something happen? When will it happen • This class starts at 3pm • How long does something take? • This class lasts for 1 hour 20 minutes • What happened first and happened later • The class started before it ended
Clocks in Distributed Systems • We use clocks for similar things in distributed systems • Take a backup at 5pm/Restore to the backup at 5pm • Take a backup every hour • Ensure that resource is released by process 1 before process 2 accesses it
Application of Clocks to Order Events • Consider a multi‐version database system • When a new version is created, we add it to existing versions • A transaction (system on behalf of the transition) can determine which version to read • Each version has a timestamp. • Suppose you have a perfectly synchronized clock and a very fast processor • Treat every transaction as if it were instantaneous • Assign a timestamp say T for the transaction • Each read and write of the transaction would have time T
Application of Clocks to Order Events • Example: • T1 has timestamp 100 • It reads x which has versions at time 0, 50, 75, 90 • T1 would read the version at time 90 • It creates a new version of x • It would have a timestamp of 100 • T2 has timestamp 110 • Assuming no transactions other than T1 and T2. If it reads x, it should read x written by T1 • Advantages • To know what the state of the system was at time 100 is trivial • Read only transactions are never aborted • Problems • If T2 ran concurrently with T1 and read x before T1 had written x, aborting T1 or T2 may be necessary • …
But.. • Our Clocks are not perfectly synchronized • Problems caused by loosely synchronized clocks • Suppose we have transactions T3 and T4 such that • T3 wrote x • T4 read x • T3 finished before T4 started • Then, T4 must be ordered later than T3 in serialization order • i.e., T4 must read x written by T3 (or some later transaction) • Loose synchronization may, however, permit the possibility that T3’s timestamp is higher than T4’s timestamp. It will prevent T4 from reading the value of x written by T3. • To prevent this problem, Google Spanner introduces the notion of commit‐wait • Force T4 to delay thereby guarantee that its timestamp is higher than that of T3
Why did this happen? • We anticipated/wanted that temporal dependency would translate into causal dependency. • T3 finished before T4 started • We wanted to T3 to impact T4 • Notion of causality captures what events can (potentially) affect other events
Causality • Causality (happened before) captures the information flow • Event a happened before b iff • a and b are on the same process and a occurred before b • a is a send event and b is corresponding receive event, or • there exists event c such that • a happened before c and • c happened before b • Lamport’s logical clocks assign a timestamp to each event such that • a happened before b l.a < l.b • Vector clocks assign a (vector) timestamp to each event such that • a happened before b vc.a < vc.b
Causality (Continued) • Implementation of Logical Clocks • When process j sends message m • l.j = l.j + 1 • l.m = l.j • For receive event where message m is received • l.j = max(l.j, l.m) + 1 • Property of logical clocks • a happened before b l.a < l.b • l.a = l.b l.a is concurrent with l.b • Useful to take a consistent snapshot
How would logical clocks be different? • Given the expected dependency between T3 and T4 • Assign timestamp of T4 to be higher than that of T3 • Waiting not involved since it is a logical clock
Let’s review what we wanted to do with (logical) clocks • When did something happen? When will it happen • This class starts at 3pm • NO • How long does something take? • This class takes 1 hour 20 minutes • NO • What happened first and happened later • The class started before it ended • YES/NO
What is the problem? • Logical clocks did not convey any meaning to the actual real/physical time
Goals • Problem: Given a distributed system, assign each event e a timestamp l.e, such that 1. e hb f => l.e < l.f 2. Space requirement of l.e is O(1) integers 3. l.e is represented with bounded space 4. l.e is close to pt.e i.e. |l.e – pt.e| is bounded.
Naïve Algorithm Logical Clocks Naïve Algoirthm • When process j sends message m • When process j sends message m • l.j = l.j + 1 • l.j = l.j + 1 • l.m = l.j • l.j := max(l.j, pt.j) • l.m = l.j • For receive event where message m is • For receive event where message m is received received • l.j = max(l.j, l.m) + 1 • l.j = max(l.j, l.m) + 1 • l.j := max(l.j, pt.j)
Naïve Algorithm Satisfies first two requirements: 1. e hb f => l.e < l.f 2. Space requirement of l.e is O(1) integers • Fails these requirements (we will ignore proof) 1. l.e is represented with bounded space 2. l.e is close to pt.e i.e. |l.e – pt.e| is bounded. • Unbounded drift caused by • l.j := max (l.j+1, pt.j), and • l.j := max(l.j+1, l.m+1, pt.j) 15
This is an example to show that drift between l and pt can increase in unbounded fashion 16
Problem with Naïve algorithm • Drift between l.e and pt.e is not bounded • Why is this a problem? • Consider the case where the user wants a snapshot of a database at time t • Since no process knows the precise physical time, the snapshot provided will not be precisely at physical time t. • It will be at time t’ (that is hopefully close to t) • If clock skew is , the best we can do is to let t’ to be in [t‐ , t+ ] 17
Algorithm for Hybrid Logical Clocks Naïve Algoirthm Revised Algorithm • When process j sends message m • When process j sends message m • l.j = l.j + 1 • l.j’ = l.j • l.j := max(l.j, pt.j) • l.j := max(l.j, pt.j) • l.m = l.j • If (l.j = l.j’) c.j = c.j + 1 • Else c.j = 0 • l.m = l.j, c.m = c.j
Algorithm for Hybrid Logical Clocks (Continued) Naïve Algorithm Revised Algorithm • Upon receiving m at j • Upon receiving m at j • l.j = max(l.j, l.m) + 1 • l.j’ := l.j; • l.j := max(l.j, pt.j) • l.j := max(l.j’, l.m, pt.j); • If (l.j =l.j’ =l.m) then c.j := max(c.j, c.m)+1 • Elseif (l.j’ =l.j) then c.j := c.j + 1 • Elseif (l.j =l.m) then c.j := c.m + 1 • Else c.j := 0 • l.m = l.j
HLC Algorithm pt.j l’.j c.j 10,10,0 0 l’.j c.j pt.j l.j := max(l’.j, pt.j); l.m = 10 l’.j pt.j l.j := max(l’.j, l.m, pt.j); l’.j c.j pt.j c.m = 0 elseif (l.j =l.m) then c.j := c.m + 1 13,13,0 1 0,0,0 1,10,1 2,10,2 14,14,0 Reset c l.m = 10 l.m = 10 l.j := max(l’.j, l.m, pt.j); c.m = 2 c.m = 4 If (l.j =l’.j) then c.j := c.j + 1 2,10,3 3,10,4 2 l.m = 10 c.m = 4 3 3,10, 5 4,10, 6 20
Properties of HLC • Logical clock property: • e hb f => (l.e, c.e) < (l.f, c.f) (lexicographical comparison) • |l.f – pt.f| <= є • pt.e <= l.e <= pt.e + є • The value c.e is bounded • c.e <= N * (number of events that can be created on a process within є) • In practice, it is very small (in single digits) 21
Let’s review what we wanted to do with (hybrid logical) clocks • When did something happen? When will it happen • The class started at 3pm • Yes. Choose l value to be within epsilon of 3pm. The best we can do anyway. • How long does something take? • The class took 1 hour 20 minutes • Look at the difference between the l values • What happened first and happened later • The class started before it ended • Use lexicographic ordering (guarantees consistency with causal order)
Revisiting Multiversion Database
Review the earlier example • Problems caused by loosely synchronized clocks • Suppose we have transactions T3 and T4 such that • T3 wrote x • T4 read x • T3 finished before T4 started • Then, T4 must be ordered later than T3 in serialization order • i.e., T4 must read x written by T3 (or some later transaction) • Loose synchronization may, however, permit the possibility that T3’s timestamp is higher than T4’s timestamp. It will prevent T4 from reading the value of x written by T3. • To prevent this problem, Google Spanner introduces the notion of commit‐wait • Force T4 to delay thereby guarantee that its timestamp is higher than that of T3
Other Choices • Alternate choice • Increase (physical) time of the machine running T4 • Unacceptable, as it would cause problems to other applications (e.g., sleep function) as well as NTP synchronization • A better choice • Create a new HLC timesdtamp for T4 that is higher than that of T3 • Leave physical time unchanged • Change l value of the timestamp (and if necessary c value) • c value is still bounded
Other Applications of HLC • Causally Consistent Data Store • Rollback on Key‐Value store • Runtime monitoring partially synchronous distributed systems
Moral Questions?
Recommend
More recommend