a voiding c oordination with n etwork o rdering
play

A VOIDING C OORDINATION WITH N ETWORK O RDERING : NOP AXOS AND E RIS - PowerPoint PPT Presentation

A VOIDING C OORDINATION WITH N ETWORK O RDERING : NOP AXOS AND E RIS Ellis Michael S ERVER FAILURES ARE THE COMMON CASE IN DATA CENTERS S ERVER FAILURES ARE THE COMMON CASE IN DATA CENTERS S ERVER FAILURES ARE THE COMMON CASE IN DATA CENTERS S


  1. NOP AXOS ATTAINS THROUGHPUT WITHIN 2% OF AN UNREPLICATED SYSTEM 500 375 Latency (us) Paxos 250 better ↓ 125 NOPaxos Unreplicated 0 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

  2. NOP AXOS ATTAINS THROUGHPUT WITHIN 2% OF AN UNREPLICATED SYSTEM 500 375 Latency (us) Paxos 250 better ↓ within 2% throughput and 125 16us latency of an unreplicated NOPaxos system Unreplicated 0 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

  3. NOP AXOS ATTAINS THROUGHPUT WITHIN 2% OF AN UNREPLICATED SYSTEM 500 375 Latency (us) Paxos NOPaxos using 250 end-host sequencer better ↓ within 2% throughput and 125 16us latency of an unreplicated NOPaxos system Unreplicated 0 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

  4. NOP AXOS ATTAINS THROUGHPUT WITHIN 2% OF AN UNREPLICATED SYSTEM 500 375 similar throughput but 36% Latency (us) higher latency Paxos NOPaxos using 250 end-host sequencer better ↓ within 2% throughput and 125 16us latency of an unreplicated NOPaxos system Unreplicated 0 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

  5. S UMMARY • Separate ordering from reliable delivery in state machine replication • A network model OUM that provides ordered but unreliable message delivery • A more e ffi cient replication protocol NOPaxos that ensures reliable delivery • The combined system achieves performance equivalent to an unreplicated system

  6. T HE E RIS T RANSACTION P ROTOCOL

  7. E XISTING TRANSACTIONAL SYSTEMS : EXTENSIVE COORDINATION Client Shard 1 Shard 2 Shard 3

  8. E XISTING TRANSACTIONAL SYSTEMS : EXTENSIVE COORDINATION req prepare ok commit Client Shard 1 Shard 2 Shard 3

  9. E XISTING TRANSACTIONAL SYSTEMS : EXTENSIVE COORDINATION req prepare ok commit Client Shard 1 Shard 2 Shard 3

  10. E XISTING TRANSACTIONAL SYSTEMS : EXTENSIVE COORDINATION req prepare ok commit Client Shard 1 Shard 2 Shard 3

  11. E RIS • Processes independent transactions 
 without coordination in the normal case • Performance within 3% of a nontransactional, unreplicated system on TPC-C • Strongly consistent, fault tolerant transactions with minimal performance penalties

  12. K EY C ONTRIBUTIONS A new architecture that divides the responsibility for transactional guarantees by …leveraging the datacenter network to order messages within and across shards …and a co-designed transaction protocol 
 with minimal coordination.

  13. T RADITIONAL L AYERED A PPROACH Atomic Commitment (2PC) Concurrency Concurrency Control (2PL) Control (2PL) Replication Replication (Paxos) (Paxos) Replica Replica Replica Replica Replica Replica

  14. T RADITIONAL L AYERED A PPROACH Atomic Commitment (2PC) Concurrency Concurrency Control (2PL) Control (2PL) Replication Replication Reliability (Paxos) (Paxos) (within shard) Replica Replica Ordering Replica Replica Replica Replica (within shard)

  15. T RADITIONAL L AYERED A PPROACH Atomic Commitment (2PC) Concurrency Concurrency Isolation Control (2PL) Control (2PL) Replication Replication Reliability (Paxos) (Paxos) (within shard) Replica Replica Ordering Replica Replica Replica Replica (within shard)

  16. T RADITIONAL L AYERED A PPROACH Reliability Ordering Atomic Commitment (2PC) (across shards) (across shard) Concurrency Concurrency Isolation Control (2PL) Control (2PL) Replication Replication Reliability (Paxos) (Paxos) (within shard) Replica Replica Ordering Replica Replica Replica Replica (within shard)

  17. A N EW W AY TO D IVIDE R ESPONSIBILITIES Eris General Transaction Isolation Protocol Reliability Reliability Independent Transaction Protocol (across shards) (within shard) Multi-sequencing Ordering Ordering (within shard) (across shard)

  18. A N EW W AY TO D IVIDE R ESPONSIBILITIES Eris General Transaction Isolation Protocol Reliability Reliability Independent Transaction Protocol (across shards) (within shard) Application Network Multi-sequencing Ordering Ordering (across shard) (within shard)

  19. G OAL Client Sequencer

  20. I N -N ETWORK C ONCURRENCY C ONTROL G OALS • Globally consistent ordering across messages delivered to multiple destination shards • No reliable delivery guarantee • Recipients can detect dropped messages

  21. A T1 T2 (ABC) (AB) B T1 T2 (ABC) (AB) C T1 (ABC) Receivers

  22. A T1 T2 (ABC) (AB) B T2 T1 (ABC) (AB) C T1 (ABC) Receivers

  23. A T1 T2 (ABC) (AB) B T1 T2 (ABC) (AB) C T1 (ABC) Receivers

  24. A T2 (AB) B T1 T2 (ABC) (AB) C T1 (ABC) Receivers

  25. A T2 DROP (AB) B T1 T2 (ABC) (AB) C T1 (ABC) Receivers

  26. A T1 T2 DROP (ABC) (AB) B T1 T2 (ABC) (AB) C T1 (ABC) Receivers

  27. A T1 T2 T1 T2 DROP T1 T2 (ABC) (AB) (ABC) (AB) (ABC) (AB) B T1 T2 T1 T2 T1 T2 (ABC) (AB) (ABC) (AB) (ABC) (AB) C T1 T1 T1 (ABC) (ABC) (ABC) Receivers

  28. M ULTI -S EQUENCED G ROUPCAST • Groupcast: message header speci fi es a set of destination multicast groups • Multi-sequenced groupcast: messages are sequenced atomically across all recipient groups • Sequencer keeps a counter for each group • Extends OUM in NOPaxos

  29. Sequencer A Counter: 
 A0 B0 C0 B C Receivers

  30. Sequencer A Counter: 
 A0 B0 C0 B T1 (ABC) C Receivers

  31. Sequencer A T1 (ABC) Counter: 
 A0 B0 C0 B C Receivers

  32. Sequencer A T1 (ABC) Counter: 
 A1 B1 C1 B C Receivers

  33. Sequencer A A1 T1 B1 (ABC) C1 Counter: 
 A1 B1 C1 B C Receivers

  34. Sequencer A A1 T1 B1 (ABC) C1 Counter: 
 A1 B1 C1 B A1 T1 B1 (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  35. Sequencer A A1 T1 B1 (ABC) C1 Counter: 
 A1 B1 C1 B A1 T1 B1 T2 (ABC) C1 (AB) C A1 T1 B1 (ABC) C1 Receivers

  36. Sequencer A A1 T2 T1 B1 (AB) (ABC) C1 Counter: 
 A1 B1 C1 B A1 T1 B1 (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  37. Sequencer A A1 T2 T1 B1 (AB) (ABC) C1 Counter: 
 A2 B2 C1 B A1 T1 B1 (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  38. Sequencer A A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 Counter: 
 A2 B2 C1 B A1 T1 B1 (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  39. Sequencer A A1 A2 T1 T2 B1 B2 (ABC) (AB) C1 Counter: 
 A2 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  40. Sequencer A A1 T1 B1 (ABC) C1 Counter: 
 A2 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  41. Sequencer A A1 T1 B1 (ABC) C1 Counter: 
 A2 B2 C1 B A2 A1 T2 T1 B2 B1 T3 (AB) (ABC) C1 (A) C A1 T1 B1 (ABC) C1 Receivers

  42. Sequencer A A1 T3 T1 B1 (A) (ABC) C1 Counter: 
 A2 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  43. Sequencer A A1 T3 T1 B1 (A) (ABC) C1 Counter: 
 A3 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  44. Sequencer A A3 A1 T3 T1 B1 (A) (ABC) C1 Counter: 
 A3 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  45. Sequencer A A3 A1 T3 T1 B1 (A) (ABC) C1 Counter: 
 A3 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  46. Sequencer A A3 A1 T3 T1 B1 (A) (ABC) C1 Counter: 
 A3 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  47. Sequencer A A3 A1 T3 T1 B1 DROP (A) (ABC) C1 Counter: 
 A3 B2 C1 B A2 A1 T2 T1 B2 B1 (AB) (ABC) C1 C A1 T1 B1 (ABC) C1 Receivers

  48. W HAT HAVE WE ACCOMPLISHED SO FAR ? • Consistently ordered groupcast primitive with 
 drop detection • How do we go from multi-sequenced groupcast to transactions?

  49. T RANSACTION M ODEL Eris supports two types of transactions • Independent transactions : ✤ One-shot (stored procedures) ✤ No cross-shard dependencies ✤ Proposed by H-Store [VLDB ’07] and Granola [ATC ’12] • Fully general transactions

  50. I NDEPENDENT T RANSACTION START TRANSACTION 
 START TRANSACTION 
 START TRANSACTION 
 UPDATE tb t1 UPDATE tb t1 UPDATE tb t1 SET t1.Salary = t1.Salary + 100 SET t1.Salary = t1.Salary + 100 SET t1.Salary = t1.Salary + 100 WHERE t1.Salary < 500 WHERE t1.Salary < 500 WHERE t1.Salary < 500 COMMIT COMMIT COMMIT Name Salary Name Salary Name Salary Alice 600 Bob 350 Charlie 400

Recommend


More recommend