Just Say NO to Paxos Overhead: Replacing Consensus with Network - PowerPoint PPT Presentation

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li , Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports

Server failures are the common case in data centers

State Machine Replication Operation Operation Operation A A A Operation Operation Operation B B B Operation Operation Operation C C C

Paxos for state machine replication request prepare prepareok reply Client Leader Replica Replica Replica

Paxos for state machine replication request prepare prepareok reply Client Leader Replica Throughput Bottleneck Replica Replica

Paxos for state machine replication request prepare prepareok reply Client Leader Replica Throughput Bottleneck Replica Replica Latency Penalty

Can we eliminate Paxos overhead? Performance overhead due to worst-case network assumptions • valid assumptions for the Internet • data center networks are different What properties should the network have to enable faster replication?

Network properties determine replication complexity Messages may be: • dropped Asynchronous • reordered • delivered with Network arbitrary latency Paxos • Paxos protocol on every operation • High performance cost

Network properties determine replication complexity All replicas: • receive the same set Reliability Asynchronous of messages • receive them in the Ordering Network same order Paxos • Paxos protocol on every operation • High performance cost

Network properties determine replication complexity All replicas: • receive the same set Reliability Asynchronous of messages • receive them in the Ordering Network same order Paxos • Replication is trivial • Paxos protocol on every operation • High performance cost

Network properties determine replication complexity All replicas: • receive the same set Reliability Asynchronous of messages • receive them in the Ordering Network same order Paxos • Replication is trivial • Paxos protocol on every operation • Network implementation • High performance cost has the same complexity as Paxos

Reliability Asynchronous Ordering Network Paxos Weak Strong Network Guarantee

Can we build a network model that: • provides performance benefits • can be implemented more efficiently Reliability Asynchronous Ordering Network Paxos Weak Strong Network Guarantee

This Talk

This Talk A new network model with near-zero-cost implementation: Ordered Unreliable Multicast

This Talk A new network model with near-zero-cost implementation: Ordered Unreliable Multicast +

This Talk A new network model with near-zero-cost implementation: Ordered Unreliable Multicast + A coordination-free replication protocol: Network-Ordered Paxos

This Talk A new network model with near-zero-cost implementation: Ordered Unreliable Multicast + A coordination-free replication protocol: Network-Ordered Paxos =

This Talk A new network model with near-zero-cost implementation: Ordered Unreliable Multicast + A coordination-free replication protocol: Network-Ordered Paxos = replication within 2% throughput overhead

Outline 1. Background on state machine replication and data center network 2. Ordered Unreliable Multicast 3. Network-Ordered Paxos 4. Evaluation

Towards an ordered but unreliable network Key Idea: Separate ordering from reliable delivery in state machine replication Network provides ordering Replication protocol handles reliability

OUM Approach • Designate one sequencer in the network • Sequencer maintains a counter for each OUM group 1. Forward OUM messages to the sequencer 2. Sequencer increments counter and writes counter value into packet headers 3. Receivers use sequence numbers to detect reordering and message drops

Ordered Unreliable Multicast Counter: 0 Senders Receivers

Ordered Unreliable Multicast 1 2 Counter: 1 2 2 0 1 1 2 Senders Receivers

Ordered Unreliable Multicast 1 2 3 4 Counter: 4 1 2 4 2 0 3 1 1 2 3 4 Senders Receivers

Ordered Unreliable Multicast 1 2 3 4 Counter: 4 1 2 DROP 4 2 0 3 1 1 2 3 4 Senders Receivers

Ordered Unreliable Multicast Ordered Multicast: 1 2 3 4 no coordination required to determine order of messages Counter: 4 1 2 DROP 4 2 0 3 1 1 2 3 4 Senders Receivers

Ordered Unreliable Multicast Ordered Multicast: 1 2 3 4 no coordination required to determine order of messages Counter: 4 1 2 DROP 4 2 0 3 1 Drop Detection: coordination only required when messages are dropped 1 2 3 4 Senders Receivers

Sequencer Implementations Middlebox In-switch End-host prototype sequencing sequencing • Cavium Octeon • next generation • no specialized network processor programmable hardware required • connects to root • incurs higher switches • implemented in switches latency penalties • adds 8 us latency • similar throughput P4 • nearly zero cost benefits

NOPaxos Overview • Built on top of the guarantees of OUM • Client requests are totally ordered but can be dropped • No coordination in the common case • Replicas run agreement on drop detection • View change protocol for leader or sequencer failure

Normal Operation Client Replica (leader) Replica Replica

Normal Operation request Client OUM Replica (leader) Replica Replica

Normal Operation request reply Client OUM Replica Execute (leader) Replica Replica

Normal Operation waits for replies from majority request reply including Client leader’ s OUM Replica Execute (leader) Replica Replica

Normal Operation waits for replies from majority request reply including Client leader’ s OUM Replica Execute (leader) no Replica coordination Replica

Normal Operation waits for 1 Round Trip Time replies from majority request reply including Client leader’ s OUM Replica Execute (leader) no Replica coordination Replica

Gap Agreement Replicas detect message drops • Non-leader replicas: recover the missing message from the leader • Leader replica: coordinates to commit a NO-OP (Paxos) • Efficient recovery from network anomalies

View Change • Handles leader or sequencer failure • Ensures that all replicas are in a consistent state • Runs a view change protocol similar to VR • view-number is a tuple of <leader-number, session-number>

Evaluation Setup • 3-level fat-tree network testbed • 5 replicas with 2.5 GHz Intel Xeon E5-2680 • Middle box sequencer Sequencer

NOPaxos achieves better throughput and latency Latency (us) better ↓ better → Throughput (ops/sec)

NOPaxos achieves better throughput and latency 1000 750 Latency (us) 500 better ↓ 250 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

NOPaxos achieves better throughput and latency 1000 750 Latency (us) 500 better ↓ Paxos 250 Fast Paxos NOPaxos 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

NOPaxos achieves better throughput and latency 1000 750 Latency (us) 500 4.7X throughput and better ↓ more than 40% Paxos reduction in latency 250 Fast Paxos NOPaxos 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

NOPaxos achieves better throughput and latency 1000 Paxos + Batching 750 Latency (us) 500 4.7X throughput and better ↓ more than 40% Paxos reduction in latency 250 Fast Paxos NOPaxos 0 65,000 130,000 195,000 260,000 better → Throughput (ops/sec)

Just Say NO to Paxos Overhead: Replacing Consensus with Network - PowerPoint PPT Presentation

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li , Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers Server failures are the common case

Fast Paxos Trevor Chan Outline Paxos Protocol 1. Fast Paxos Protocol 2. Consensus

Distributed Systems: Paxos Burcu Canakci & Matt Burke Outline 1. Consensus 2. The

The ABCDs of Paxos Consensus: a set of processes decide on an input value Main application:

The ABCDs of Paxos Replicated state machines Consensus: a set of processes decide on an input

Paxos Week: Return of the State Machine Doug Woos Logistics notes No in-class lecture Monday

Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how

Flexible Paxos: Quorum Intersection Revisited Wen-Chien Wang Review Paxos Prepare Promise

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

They Say, I Say: TEMPLATES FOR WRITING ABOUT RESEARCH They Say, I Say (Graff, Birkenstein, and

Consensus: Paxos Haobin Ni Oct 22, 2018 What is consensus? A group of people go to the same

Paxos Consensus, Abstracted and Deconstructed lvaro Garca Prez , Alexey Gotsman, Yuri

Total-Ordering vanilladb.org Why Paxos? Flooding consensus algorithm spends too much time

Generalized Consensus and Paxos Lamport, Leslie 2005 Problem

NO X AND NO Y IN THE TROPICAL MARINE BOUNDARY LAYER AT CAPE VERDE C . R E E D , J . D . L E E ,

No Training Hurdles: Fast Training- Agnostic Attacks to Infer Your Typing Song Fang * , Ian

no- more PowerPoint Examples: Notes To Slides no- more | Nytorv 3. 1 st floor, 1450 Kbenhavn

How to (not) Share a Password: Privacy preserving protocols for finding heavy vy hit itters

Locally private learning without interaction requires separation Vitaly Feldman Research with

Boundedness of Conjunctive Regular Path Queries Pablo Barcel (Univ. of Chile & IMFD) Diego

Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth

Macro II/Aussenwirtschaft Lecture Slides No 11 Gerald Willmann June 2020 c Gerald Willmann,

Sambuz

Useful Links

Newsletter

Mail Us

Just Say NO to Paxos Overhead: Replacing Consensus with Network - PowerPoint PPT Presentation

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li , Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers Server failures are the common case

Fast Paxos Trevor Chan Outline Paxos Protocol 1. Fast Paxos Protocol 2. Consensus

Distributed Systems: Paxos Burcu Canakci &amp; Matt Burke Outline 1. Consensus 2. The

The ABCDs of Paxos Consensus: a set of processes decide on an input value Main application:

The ABCDs of Paxos Replicated state machines Consensus: a set of processes decide on an input

Paxos Week: Return of the State Machine Doug Woos Logistics notes No in-class lecture Monday

Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how

Flexible Paxos: Quorum Intersection Revisited Wen-Chien Wang Review Paxos Prepare Promise

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

They Say, I Say: TEMPLATES FOR WRITING ABOUT RESEARCH They Say, I Say (Graff, Birkenstein, and

Consensus: Paxos Haobin Ni Oct 22, 2018 What is consensus? A group of people go to the same

Paxos Consensus, Abstracted and Deconstructed lvaro Garca Prez , Alexey Gotsman, Yuri

Total-Ordering vanilladb.org Why Paxos? Flooding consensus algorithm spends too much time

Generalized Consensus and Paxos Lamport, Leslie 2005 Problem

NO X AND NO Y IN THE TROPICAL MARINE BOUNDARY LAYER AT CAPE VERDE C . R E E D , J . D . L E E ,

No Training Hurdles: Fast Training- Agnostic Attacks to Infer Your Typing Song Fang * , Ian

no- more PowerPoint Examples: Notes To Slides no- more | Nytorv 3. 1 st floor, 1450 Kbenhavn

How to (not) Share a Password: Privacy preserving protocols for finding heavy vy hit itters

Locally private learning without interaction requires separation Vitaly Feldman Research with

Boundedness of Conjunctive Regular Path Queries Pablo Barcel (Univ. of Chile &amp; IMFD) Diego

Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth

Macro II/Aussenwirtschaft Lecture Slides No 11 Gerald Willmann June 2020 c Gerald Willmann,

Sambuz

Useful Links

Newsletter

Mail Us

Distributed Systems: Paxos Burcu Canakci & Matt Burke Outline 1. Consensus 2. The

Boundedness of Conjunctive Regular Path Queries Pablo Barcel (Univ. of Chile & IMFD) Diego