SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao * , Quanlu Zhang † , Zhi Yang * , Ming Wu † , Yafei Dai * * Peking University † Microsoft Research
Replication for Fault Tolerance Peking University, Microsoft Research 2
Replication in the Wide Area 150ms 20ms - Reducing wide-area latency for clients Peking University, Microsoft Research 3
Keeping the Replicated State Consistent “Having fun at SoCC !” “Having fun at OSDI!” Inconsistent! Peking University, Microsoft Research 4
State Machine Replication (SMR) A = 3 A = 3 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 Execute the same sequence of commands in the same order Peking University, Microsoft Research 5
Paxos - A distributed agreement protocol - Tolerates F failures given 2F+1 replicas - Choose a single command for eac ach command slo slot t using a Paxos ins instance A = 1 A = 1 A = 1 Paxos instance 1 Peking University, Microsoft Research 6
Paxos - A distributed agreement protocol - Tolerates F failures given 2F+1 replicas - Choose a single command for eac ach command slo slot t using a Paxos ins instance A = 1 A = 2 A = 1 A = 2 A = 1 A = 2 Paxos instance 2 Peking University, Microsoft Research 7
Paxos - A distributed agreement protocol - Tolerates F failures given 2F+1 replicas - Choose a single command for eac ach command slo slot t using a Paxos ins instance A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 A = 1 A = 2 A = 3 Paxos instance 3 Peking University, Microsoft Research 8
Centralized SMR - Liveness property of Paxos: - There should not be multiple replicas proposing commands in the same instance simultaneously A = 1 A = 2 A = 3 Conflict! Peking University, Microsoft Research 9
Centralized SMR - Liveness property of Paxos: - There should not be multiple replicas proposing commands in the same instance simultaneously A stable leader A = 1 A = 2 A = 3 Peking University, Microsoft Research 10
Drawbacks of Centralized SMR - Potential performance bottleneck - Low throughput Peking University, Microsoft Research 11
Drawbacks of Centralized SMR - Potential performance bottleneck - Low throughput - High wide-area latency 20ms 200ms Peking University, Microsoft Research 12
Drawbacks of Centralized SMR - Potential performance bottleneck - Low throughput - High wide-area latency Centralized SMR Limited performance Peking University, Microsoft Research 13
Drawbacks of Centralized SMR - Potential performance bottleneck - Low throughput - High wide-area latency Centralized SMR Decentralized SMR Limited performance High performance? Peking University, Microsoft Research 14
Decentralizing SMR Replicas should propose commands in different command slots R0 R1 R2 A = 0 A = 0 A = 0 How to order them? Peking University, Microsoft Research 15
Decentralizing SMR Replicas should propose commands in different command slots R0 R1 R2 A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 How to order them? Peking University, Microsoft Research 16
Decentralizing SMR Replicas should propose commands in different command slots R0 R1 R2 A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 How to order them? Peking University, Microsoft Research 17
Static Ordering - The system runs at the speed of the slo slowest one Straggler A = 1 A = 2 A = 3 Blocked Peking University, Microsoft Research 18
Dependency-based Ordering - Ordering overhead under contention A = 1 A = 1 A = 3 A = 3 A = 2 A = 2 A = 3 A = 3 Peking University, Microsoft Research 19
Dependency-based Ordering - Ordering overhead under contention A = 1 A = 2 A = 3 Peking University, Microsoft Research 20
Drawbacks of Decentralized SMR - Extra coordination for ordering => performance degradation - Lower throughput - Higher latency Centralized SMR Decentralized SMR Limited performance Poor performance stability Peking University, Microsoft Research 21
Drawbacks of Decentralized SMR - Extra coordination for ordering => performance degradation - Lower throughput - Higher latency Semi-Decentralized SMR High performance SDPaxos Strong performance stability Peking University, Microsoft Research 22
SDPaxos Intuition R0 R1 R2 A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 Peking University, Microsoft Research 23
SDPaxos Intuition R0 R1 R2 A = 0 A = 0 A = 0 A = 1 A = 1 A = 1 A = 2 A = 2 A = 2 R2 R1 R0 A = 0 A = 1 A = 2 Peking University, Microsoft Research 24
Centralizing Ordering I want to propose a command Sequencer R0 R1 R2 R0 R2 - Dynamical leadership establishment (stragglers won’t block others) - All commands are serialized (no conflicts) - Ordering is more lightweight than replicating Peking University, Microsoft Research 25
SDPaxos: The Basic Protocol Client request for command A 1.5 round trips R0 Replicating A to others C-accept (A) C-ACK (A) O-ACK (R0) w/o execution order R1 O-ACK (R0) O-accept (R0) Assigning A to the next slot R2 (Sequencer) Peking University, Microsoft Research 26
Reducing Latency for 3 Replicas Client request for R0 and R2 have command A constituted a majority R0 Replicating A to others C-accept (A) C-ACK (A) O-ACK (R0) w/o execution order R1 O-ACK (R0) O-accept (R0) Assigning A to the next slot R2 (Sequencer) Peking University, Microsoft Research 27
Reducing Latency for 3 Replicas Client request for R0 and R2 have command A 1 round trip constituted a majority R0 Replicating A to others C-accept (A) C-ACK (A) w/o execution order R1 O-ACK (R0) O-accept (R0) Assigning A to the next slot R2 (Sequencer) Peking University, Microsoft Research 28
Reducing Latency for 5 Replicas This assignment can be lost if R0 and R2 fail R0 C-accept (A) C-ACK (A) R1 O-accept (R0) R2 (Sequencer) R3 R4 Peking University, Microsoft Research 29
Reducing Latency for 5 Replicas R0 R1 R2 Assignments for the sequencer (Sequencer) C-accept & C-ACK & can be seen by a majority in O-accept O-ACK just one round trip R3 R4 Peking University, Microsoft Research 30
Handling Failures for 5 Replicas R0 R0 R1 R2 R3 R4 (Seq) R1 R0 R1 R2 R0 R2 R3 R3 R4 R4 Peking University, Microsoft Research 31
Handling Failures for 5 Replicas R0 R0 R1 R2 R3 R4 (Seq) R1 R0 R1 R2 R0 R2 R3 R3 R4 R4 R2 R0 R3 R4 R1 Peking University, Microsoft Research 32
More Details in the Paper - The detailed protocol and fault tolerance approach - Reads bypassing Paxos - Leveraging the centralized ordering to perform fast and safe reads - Performance optimizations - Lightening the load of ordering - Straggler detection - … Peking University, Microsoft Research 33
Experimental Setup - Baselines - Multi-Paxos - Mencius - EPaxos - Workload: a replicated key-value store - Testbed: Amazon EC2 m4.large instances - Wide-area experiments: CA, OR, OH, IRE, SEL Peking University, Microsoft Research 34
Performance Stability against Stragglers 120000 20.0% 100000 Throughput (ops / sec) 28.2% 1.6x 80000 47.7% 60000 40000 67.2% 20000 0 Multi-Paxos Mencius SDPaxos-N SDPaxos-S Peking University, Microsoft Research 35
Performance Stability against Contention 75000 Throughput (ops / sec) 70000 65000 60000 1.35x 55000 50000 45000 40000 35000 30000 0% 5% 25% 50% 75% 100% Contention rate EPaxos-3 EPaxos-5 SDPaxos-3 SDPaxos-5 Peking University, Microsoft Research 36
Wide-area Latency Latency (ms) - SDPaxos achieves optimal number of round trips - SDPaxos’s latency is relevant to the distance to the sequencer (IRE) - SDPaxos’s latency is not impacted by stragglers or contention Peking University, Microsoft Research 37
Conclusion - The first semi-decentralized SMR protocol - High performance - Strong performance stability - One-round-trip under realistic configurations tolerating one or two failures - High throughput, low latency with stragglers, under contention or in ideal cases Peking University, Microsoft Research 38
Q & A Peking University, Microsoft Research 39
Recommend
More recommend