Paxos Consensus, Abstracted and Deconstructed Álvaro García Pérez , Alexey Gotsman, Yuri Meshamn, and Ilya Sergey April 19 th 2008
Consensus • Several nodes, which can crash
Consensus v 1 v 2 v 3 • Several nodes, which can crash • Each node proposes a value
Consensus v 1 v 2 v 3 ✘ v 2 v 2 • Several nodes, which can crash • Each node proposes a value • All non-crashed nodes agree on a single value
Deterministic state machine c 1 c 2 c 3 Clients submit commands
Deterministic state machine c 1 c 2 c 3 c 1 , c 2 , c 3 r 1 , r 2 , r 3 Machine totally orders commands and computes the sequence of results
Deterministic state machine c 1 c 2 c 3 c 1 , c 2 , c 3 ✘ Machine totally orders commands and computes the sequence of results
State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 Clients send commands to all replicas Replicas may receive commands in difgerent orders
State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 T otally order commands via a sequence of consensus instances
State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 Replicas compute the same sequence of results
State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 ✘ c 2 , c 1 , c 3 c 2 , c 1 , c 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 Replicas compute the same sequence of results
State machine replication Correctness: replicated implementation is linearizable wrt single-server one: replication transparent to clients c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 ✘ c 2 , c 1 , c 3 c 2 , c 1 , c 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 Replicas compute the same sequence of results
The zoo of consensus protocols • Viewstamped • Mencius (2008) • Vertical Paxos (2009) replication (1988) • Paxos (1998) • ZAB (2009) • Disk Paxos (2003) • Ring Paxos (2010) • Cheap Paxos (2004) • Egalitarian Paxos • Generalized Paxos (2013) • Raft (2014) (2004) • Paxos Commit (2004) • M2Paxos (2016) • Fast Paxos (2006) • Flexible Paxos (2016) • Stoppable Paxos • Caesar (2017) (2008)
The zoo of consensus Complex protocols: protocols constant fjght for better performance • Viewstamped • Mencius (2008) • Vertical Paxos (2009) replication (1988) • Paxos (1998) • ZAB (2009) • Disk Paxos (2003) • Ring Paxos (2010) • Cheap Paxos (2004) • Egalitarian Paxos • Generalized Paxos (2013) • Raft (2014) (2004) • Paxos Commit (2004) • M2Paxos (2016) • Fast Paxos (2006) • Flexible Paxos (2016) • Stoppable Paxos • Caesar (2017) (2008)
The zoo of consensus Complex protocols: protocols constant fjght for better performance • Viewstamped • Mencius (2008) • Vertical Paxos (2009) replication (1988) • Paxos (1998) • ZAB (2009) • Disk Paxos (2003) • Ring Paxos (2010) • Cheap Paxos (2004) • Egalitarian Paxos • Generalized Paxos (2013) • Raft (2014) (2004) • Paxos Commit (2004) • M2Paxos (2016) • Fast Paxos (2006) • Flexible Paxos (2016) • Stoppable Paxos • Caesar (2017) (2008)
Broken [Michael + 2016]
Goals • Develop methods for proving protocols correct, including realistic deployments • Get insights into their structure
Goals • Develop methods for proving protocols correct, including realistic deployments • Get insights into their structure • Focus on single-decree Paxos and Multi-Paxos
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 2 P 1
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 P 1
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 P 1
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 S 1 atomic { ... }
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 P 2 (S 1 ) ⊑ S 2 S 1 atomic { ... }
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 S 2 P 2 (S 1 ) ⊑ S 2 atomic { ... ... }
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 S 2 P 2 (S 1 ) ⊑ S 2 atomic { ... P 3 (S 2 ) ⊑ S 3 ... }
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] S 3 P 1 ⊑ S 1 atomic { P 2 (S 1 ) ⊑ S 2 ... ... P 3 (S 2 ) ⊑ S 3 ... }
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] •Transformations of the network semantics, à la Verifjed System Transformers of the Verdi framework [Wilcox + 2015]
Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] •Transformations of the network semantics, à la Verifjed System Transformers of the Verdi framework [Wilcox + 2015] Prove one variant of the protocol without unpacking the proof of a simpler variant
v 1 v 2 v 3 1 2 3 Acceptor Acceptor Acceptor Acceptor Acceptor Acceptor Proposer Proposer • Acceptors = members of parliament: can vote to accept a value,majority wins • Proposer = parliament speaker: proposes its value to vote on
1 2 3 Round#: 0 Round#: 0 Round#: 0 Accepted: ? Accepted: ? Accepted: ? • Phase 1: a proposer choses a round r and convinces a majority of acceptors to switch to r • Acceptor switches only if it’s current round is less
r 1 2 3 Round#: 0 Round#: r Round#: 0 Accepted: ? Accepted: ? Accepted: ? • Phase 1: a proposer choses a round r and convinces a majority of acceptors to switch to r • Acceptor switches only if it’s current round is less
ok 1 2 3 Round#: r Round#: r Round#: 0 Accepted: ? Accepted: ? Accepted: ? • Phase 1: a proposer choses a round r and convinces a majority of acceptors to switch to r • Acceptor switches only if it’s current round is less
r, v 2 1 2 3 Round#: r Round#: r Round#: 0 Accepted: ? Accepted: v 2 Accepted: ? • Phase 2: the proposer sends its value tagged with the round number • Acceptor only accepts a value tagged with the round it is in
ok 1 2 3 Round#: r Round#: r Round#: 0 Accepted: v 2 Accepted: v 2 ✔ Accepted: ? Reply v 2 to client • Phase 2: the proposer sends its value tagged with the round number • Acceptor only accepts a value tagged with the round it is in
rʹ 1 2 3 Round#: r Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: ? Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to switch to r’
ok, r, v 2 1 2 3 Round#: rʹ Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: ? Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to switch to r’ • Acceptor sends to the proposer its round number and value
ok, r, v 2 1 2 3 Round#: rʹ Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: v 2 Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to switch to r’ • Acceptor sends to the proposer its round number and value • If some acceptor has accepted a value, the proposer proposes the value with the highest round number
ok, r, v 2 1 2 3 Round#: rʹ Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: v 2 Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to Ensures that the chosen switch to r’ value v 2 will not be changed • Acceptor sends to the proposer its round number and value • If some acceptor has accepted a value, the proposer proposes the value with the highest round number
Modular structure in single-decree Paxos • Steal abstractions from an existing analysis of Paxos [Boichat + 2003] • Show their linearizability ➜ modular proof of Paxos
Round Based Register [Boichat + 2003] • Data type Paxos encapsulating the RB Consensus state of acceptors RB Register • read(int k) Phase 1 of Paxos • write(int k, val v) Phase 2 of Paxos
Recommend
More recommend