i i can an t belie elieve e it it s no not c causal
play

I I Can ant Belie elieve e It Its No Not C Causal ! ! - PowerPoint PPT Presentation

I I Can ant Belie elieve e It Its No Not C Causal ! ! Scalable Causal Consistency cy wit with No o Slo lowdown wn Cas ascad ades Syed Akbar Mehdi 1 , Cody Littley 1 , Natacha Crooks 1 , Lorenzo Alvisi 1,4 , Nathan Bronson 2 ,


  1. I I Can an’t Belie elieve e It It’s No Not C Causal ! ! Scalable Causal Consistency cy wit with No o Slo lowdown wn Cas ascad ades Syed Akbar Mehdi 1 , Cody Littley 1 , Natacha Crooks 1 , Lorenzo Alvisi 1,4 , Nathan Bronson 2 , Wyatt Lloyd 3 1 UT Austin, 2 Facebook, 3 USC, 4 Cornell University

  2. Causal Consistency: Great In Th Theory Eventual Consistency Higher Perf. Causal Consistency Strong Consistency Stronger Guarantees • Lots of exciting research building scalable causal data-stores, e.g., Ø Eiger [NSDI 13] Ø Cure [ICDCS 16] Ø COPS [SOSP 11] Ø Orbe [SOCC 13] Ø TARDiS [SIGMOD 16] Ø Bolt-On [SIGMOD 13] Ø Chain Reaction [EuroSys 13] Ø GentleRain [SOCC 14]

  3. Ca Causa sal Co Consi sistency: y: Bu But In Practice … The middle child of consistency models Reality: Largest web apps use eventual consistency, e.g., TAO Manhattan Espresso

  4. Ke Key Hurdle: Slowdown Cascades Wait Enforce Consistency Wait Implicit Assumption of Reality at Scale Slowdown Cascade Current Causal Systems

  5. Datacenter B Datacenter A Replicated and sharded storage for a social network

  6. W 1 Datacenter B Datacenter A Writes causally ordered as 𝑋 " → 𝑋 $ → 𝑋 %

  7. W 1 Applied ? W 1 W 2 Buffered Applied ? W 2 W 3 Buffered Datacenter B Datacenter A Current causal systems enforce consistency as a datastore invariant

  8. Slowdown Cascade W 1 W 1 Delayed Applied ? W 1 W 2 Buffered Applied ? W 2 W 3 Buffered Datacenter B Datacenter A Alice’s advisor unnecessarily waits for Justin Bieber’s update despite not reading it

  9. Slowdown Cascade W 1 W 1 Delayed Applied ? W 1 Slowdown cascades affect all previous W 2 causal systems because they enforce consistency inside the data store Buffered Applied ? W 2 W 3 Buffered Datacenter B Datacenter A Alice’s advisor unnecessarily waits for Justin Bieber’s update despite not reading it

  10. Slowdown Ca Sl Cascades in Ei Eiger (NS (NSDI DI ‘1 ‘13) 1200 Buffered Replicated Writes 1000 Replicated write buffers 800 grow arbitrarily because Eiger enforces consistency 600 inside the datastore 400 200 0 0 500 1000 1500 2000 2500 Replicated writes received Normal Slowdown

  11. OCCULT Observable Causal Consistency Using Lossy Timestamps

  12. Ob Obse servable Ca Causa sal Co Consi sistency Causal Consistency guarantees that each client observes a monotonically non-decreasing set of updates (including its own) in an order that respects potential causality between operations Key Idea: Don’t implement a causally consistent data store Let clients observe a causally consistent data store

  13. Datacenter B Datacenter A How do clients observe a causally consistent datastore ?

  14. Slave Master Master Slave Slave Master Datacenter A Datacenter B Writes accepted only by master shards and then replicated asynchronously in-order to slaves

  15. 7 7 Slave Master 4 4 Master Slave 8 8 Slave Master Datacenter A Datacenter B Each shard keeps track of a shardstamp which counts the writes it has applied

  16. 7 7 4 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Causal Timestamp : Vector of shardstamps which identifies a global state across all shards

  17. 7 7 a 4 3 2 4 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Write Protocol: Causal timestamps stored with objects to propagate dependencies

  18. 7 8 a 8 3 2 8 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Write Protocol: Server shardstamp is incremented and merged into causal timestamps

  19. 7 8 a 8 3 2 8 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Always safe to read from master

  20. 7 8 a 8 3 2 8 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 8 3 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Object’s causal timestamp merged into client’s causal timestamp

  21. 7 8 a 8 3 2 8 3 2 Slave Master Client 1 5 4 0 0 0 b 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Causal timestamp merging tracks causal ordering for writes following reads

  22. 7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 4 0 0 0 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Replication: Like eventual consistency; asynchronous, unordered, writes applied immediately

  23. 7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 5 0 0 0 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Replication: Slaves increment their shardstamps using causal timestamp of a replicated write

  24. 7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master ? ≥ 5 0 Client 1 5 5 0 0 0 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Clients do consistency check when reading from slaves

  25. 7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master ? ≥ 5 0 Client 1 5 5 0 0 0 8 b b 8 5 5 b 8 5 5 Client 3 Master Slave b ’s dependencies are delayed, 8 5 5 8 8 but we can read it anyway! Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Clients do consistency check when reading from slaves

  26. ? ≥ 7 8 7 8 Stale Shard ! a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 5 8 5 5 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Clients do consistency check when reading from slaves

  27. ? ≥ 7 8 7 8 Stale Shard ! a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 5 8 5 5 b b 8 5 5 8 5 5 Client 3 Master Slave Options: 8 5 5 1. Retry locally 8 8 2. Read from master Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Resolving stale reads

  28. Causal Ti Timestamp Compression • What happens at scale when number of shards is (say) 100,000 ? 400 234 23 87 9 102 78 Size(Causal Timestamp) == 100,000 ?

  29. Causal Ti Timestamp Compression: Strawman • To compress down to n , conflate shardstamps with same ids modulo n 1000 89 13 209 Compress 1000 209 • Problem: False Dependencies • Solution: • Use system clock as the next value of shardstamp on a write • Decouples shardstamp value from number of writes on each shard

  30. Causal Ti Timestamp Compression: Strawman • To compress down to n , conflate shardstamps with same ids modulo n 1000 89 13 209 Compress 1000 209 • Problem: Modulo arithmetic still conflates unrelated shardstamps

  31. Causal Ti Timestamp Compression • Insight : Recent shardstamps more likely to create false dependencies • Use high resolution for recent shardstamps and conflate the rest Catch-all Shardstamps 4000 3989 3880 3873 3723 3678 shardstamp Shard IDs 45 89 34 402 123 * • 0.01 % false dependencies with just 4 shardstamps and 16K logical shards

  32. Transactions in OCCULT Scalable causally consistent general purpose transactions

  33. Pr Properties of Transactions A. Atomicity B. Read from a causally consistent snapshot C. No concurrent conflicting writes

  34. Pr Properties of Transactions A. Observable Atomicity B. Observably Read from a causally consistent snapshot C. No concurrent conflicting writes

  35. Properties of Transactions Pr A. Observable Atomicity B. Observably Read from a causally consistent snapshot C. No concurrent conflicting writes Pr Properties of Pr Proto tocol 1. No centralized timestamp authority (e.g. per-datacenter) Transactions ordered using causal timestamps § 2. Transaction commit latency is independent of number of replicas

  36. Pr Properties of Transactions A. Observable Atomicity B. Observably Read from a causally consistent snapshot C. No concurrent conflicting writes Th Three Phase Protocol 1. Read Phase Buffer writes at client § 2. Validation Phase Client validates A, B and C using causal timestamps § 3. Commit Phase Buffered writes committed in an observably atomic way §

  37. 0 0 a = [] a = [] 0 0 0 0 0 0 Slave Master 1 1 b = [Bob] b = [Bob] 0 1 0 0 1 0 Master Slave 1 1 c = [Cal] c = [Cal] 0 0 1 0 0 1 Master Slave Datacenter A Datacenter B Alice and her advisor are managing lists of students for three courses

  38. 0 0 a = [] a = [] 0 0 0 0 0 0 Slave Master 1 1 b = [Bob] b = [Bob] 0 1 0 0 1 0 Master Slave 1 1 c = [Cal] c = [Cal] 0 0 1 0 0 1 Master Slave Datacenter A Datacenter B Observable atomicity and causally consistent snapshot reads enforced by single mechanism

  39. 0 0 a = [] a = [] 0 0 0 0 0 0 Slave Master Start T 1 1 1 r(a) = [] b = [Bob] b = [Bob] w(a = [Abe]) 0 1 0 0 1 0 Master Slave 1 1 c = [Cal] c = [Cal] 0 0 1 0 0 1 Master Slave Datacenter A Datacenter B Transaction T 1 : Alice adding Abe to course a

Recommend


More recommend