I I Can an’t Belie elieve e It It’s No Not C Causal ! ! Scalable Causal Consistency cy wit with No o Slo lowdown wn Cas ascad ades Syed Akbar Mehdi 1 , Cody Littley 1 , Natacha Crooks 1 , Lorenzo Alvisi 1,4 , Nathan Bronson 2 , Wyatt Lloyd 3 1 UT Austin, 2 Facebook, 3 USC, 4 Cornell University
Causal Consistency: Great In Th Theory Eventual Consistency Higher Perf. Causal Consistency Strong Consistency Stronger Guarantees • Lots of exciting research building scalable causal data-stores, e.g., Ø Eiger [NSDI 13] Ø Cure [ICDCS 16] Ø COPS [SOSP 11] Ø Orbe [SOCC 13] Ø TARDiS [SIGMOD 16] Ø Bolt-On [SIGMOD 13] Ø Chain Reaction [EuroSys 13] Ø GentleRain [SOCC 14]
Ca Causa sal Co Consi sistency: y: Bu But In Practice … The middle child of consistency models Reality: Largest web apps use eventual consistency, e.g., TAO Manhattan Espresso
Ke Key Hurdle: Slowdown Cascades Wait Enforce Consistency Wait Implicit Assumption of Reality at Scale Slowdown Cascade Current Causal Systems
Datacenter B Datacenter A Replicated and sharded storage for a social network
W 1 Datacenter B Datacenter A Writes causally ordered as 𝑋 " → 𝑋 $ → 𝑋 %
W 1 Applied ? W 1 W 2 Buffered Applied ? W 2 W 3 Buffered Datacenter B Datacenter A Current causal systems enforce consistency as a datastore invariant
Slowdown Cascade W 1 W 1 Delayed Applied ? W 1 W 2 Buffered Applied ? W 2 W 3 Buffered Datacenter B Datacenter A Alice’s advisor unnecessarily waits for Justin Bieber’s update despite not reading it
Slowdown Cascade W 1 W 1 Delayed Applied ? W 1 Slowdown cascades affect all previous W 2 causal systems because they enforce consistency inside the data store Buffered Applied ? W 2 W 3 Buffered Datacenter B Datacenter A Alice’s advisor unnecessarily waits for Justin Bieber’s update despite not reading it
Slowdown Ca Sl Cascades in Ei Eiger (NS (NSDI DI ‘1 ‘13) 1200 Buffered Replicated Writes 1000 Replicated write buffers 800 grow arbitrarily because Eiger enforces consistency 600 inside the datastore 400 200 0 0 500 1000 1500 2000 2500 Replicated writes received Normal Slowdown
OCCULT Observable Causal Consistency Using Lossy Timestamps
Ob Obse servable Ca Causa sal Co Consi sistency Causal Consistency guarantees that each client observes a monotonically non-decreasing set of updates (including its own) in an order that respects potential causality between operations Key Idea: Don’t implement a causally consistent data store Let clients observe a causally consistent data store
Datacenter B Datacenter A How do clients observe a causally consistent datastore ?
Slave Master Master Slave Slave Master Datacenter A Datacenter B Writes accepted only by master shards and then replicated asynchronously in-order to slaves
7 7 Slave Master 4 4 Master Slave 8 8 Slave Master Datacenter A Datacenter B Each shard keeps track of a shardstamp which counts the writes it has applied
7 7 4 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Causal Timestamp : Vector of shardstamps which identifies a global state across all shards
7 7 a 4 3 2 4 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Write Protocol: Causal timestamps stored with objects to propagate dependencies
7 8 a 8 3 2 8 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Write Protocol: Server shardstamp is incremented and merged into causal timestamps
7 8 a 8 3 2 8 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 6 2 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Always safe to read from master
7 8 a 8 3 2 8 3 2 Slave Master Client 1 4 4 0 0 0 Client 3 Master Slave 8 3 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Object’s causal timestamp merged into client’s causal timestamp
7 8 a 8 3 2 8 3 2 Slave Master Client 1 5 4 0 0 0 b 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Causal timestamp merging tracks causal ordering for writes following reads
7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 4 0 0 0 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Replication: Like eventual consistency; asynchronous, unordered, writes applied immediately
7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 5 0 0 0 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Replication: Slaves increment their shardstamps using causal timestamp of a replicated write
7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master ? ≥ 5 0 Client 1 5 5 0 0 0 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Clients do consistency check when reading from slaves
7 8 a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master ? ≥ 5 0 Client 1 5 5 0 0 0 8 b b 8 5 5 b 8 5 5 Client 3 Master Slave b ’s dependencies are delayed, 8 5 5 8 8 but we can read it anyway! Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Clients do consistency check when reading from slaves
? ≥ 7 8 7 8 Stale Shard ! a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 5 8 5 5 b b 8 5 5 8 5 5 Client 3 Master Slave 8 5 5 8 8 Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Clients do consistency check when reading from slaves
? ≥ 7 8 7 8 Stale Shard ! a 8 3 2 a 8 3 2 Delayed! 8 3 2 Slave Master Client 1 5 5 8 5 5 b b 8 5 5 8 5 5 Client 3 Master Slave Options: 8 5 5 1. Retry locally 8 8 2. Read from master Client 2 Slave Master Datacenter A Datacenter B Read Protocol: Resolving stale reads
Causal Ti Timestamp Compression • What happens at scale when number of shards is (say) 100,000 ? 400 234 23 87 9 102 78 Size(Causal Timestamp) == 100,000 ?
Causal Ti Timestamp Compression: Strawman • To compress down to n , conflate shardstamps with same ids modulo n 1000 89 13 209 Compress 1000 209 • Problem: False Dependencies • Solution: • Use system clock as the next value of shardstamp on a write • Decouples shardstamp value from number of writes on each shard
Causal Ti Timestamp Compression: Strawman • To compress down to n , conflate shardstamps with same ids modulo n 1000 89 13 209 Compress 1000 209 • Problem: Modulo arithmetic still conflates unrelated shardstamps
Causal Ti Timestamp Compression • Insight : Recent shardstamps more likely to create false dependencies • Use high resolution for recent shardstamps and conflate the rest Catch-all Shardstamps 4000 3989 3880 3873 3723 3678 shardstamp Shard IDs 45 89 34 402 123 * • 0.01 % false dependencies with just 4 shardstamps and 16K logical shards
Transactions in OCCULT Scalable causally consistent general purpose transactions
Pr Properties of Transactions A. Atomicity B. Read from a causally consistent snapshot C. No concurrent conflicting writes
Pr Properties of Transactions A. Observable Atomicity B. Observably Read from a causally consistent snapshot C. No concurrent conflicting writes
Properties of Transactions Pr A. Observable Atomicity B. Observably Read from a causally consistent snapshot C. No concurrent conflicting writes Pr Properties of Pr Proto tocol 1. No centralized timestamp authority (e.g. per-datacenter) Transactions ordered using causal timestamps § 2. Transaction commit latency is independent of number of replicas
Pr Properties of Transactions A. Observable Atomicity B. Observably Read from a causally consistent snapshot C. No concurrent conflicting writes Th Three Phase Protocol 1. Read Phase Buffer writes at client § 2. Validation Phase Client validates A, B and C using causal timestamps § 3. Commit Phase Buffered writes committed in an observably atomic way §
0 0 a = [] a = [] 0 0 0 0 0 0 Slave Master 1 1 b = [Bob] b = [Bob] 0 1 0 0 1 0 Master Slave 1 1 c = [Cal] c = [Cal] 0 0 1 0 0 1 Master Slave Datacenter A Datacenter B Alice and her advisor are managing lists of students for three courses
0 0 a = [] a = [] 0 0 0 0 0 0 Slave Master 1 1 b = [Bob] b = [Bob] 0 1 0 0 1 0 Master Slave 1 1 c = [Cal] c = [Cal] 0 0 1 0 0 1 Master Slave Datacenter A Datacenter B Observable atomicity and causally consistent snapshot reads enforced by single mechanism
0 0 a = [] a = [] 0 0 0 0 0 0 Slave Master Start T 1 1 1 r(a) = [] b = [Bob] b = [Bob] w(a = [Abe]) 0 1 0 0 1 0 Master Slave 1 1 c = [Cal] c = [Cal] 0 0 1 0 0 1 Master Slave Datacenter A Datacenter B Transaction T 1 : Alice adding Abe to course a
Recommend
More recommend