paxos consensus abstracted and deconstructed
play

Paxos Consensus, Abstracted and Deconstructed lvaro Garca Prez , - PowerPoint PPT Presentation

Paxos Consensus, Abstracted and Deconstructed lvaro Garca Prez , Alexey Gotsman, Yuri Meshamn, and Ilya Sergey April 19 th 2008 Consensus Several nodes, which can crash Consensus v 1 v 2 v 3 Several nodes, which can crash Each


  1. Paxos Consensus, Abstracted and Deconstructed Álvaro García Pérez , Alexey Gotsman, Yuri Meshamn, and Ilya Sergey April 19 th 2008

  2. Consensus • Several nodes, which can crash

  3. Consensus v 1 v 2 v 3 • Several nodes, which can crash • Each node proposes a value

  4. Consensus v 1 v 2 v 3 ✘ v 2 v 2 • Several nodes, which can crash • Each node proposes a value • All non-crashed nodes agree on a single value

  5. Deterministic state machine c 1 c 2 c 3 Clients submit commands

  6. Deterministic state machine c 1 c 2 c 3 c 1 , c 2 , c 3 r 1 , r 2 , r 3 Machine totally orders commands and computes the sequence of results

  7. Deterministic state machine c 1 c 2 c 3 c 1 , c 2 , c 3 ✘ Machine totally orders commands and computes the sequence of results

  8. State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 Clients send commands to all replicas Replicas may receive commands in difgerent orders

  9. State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 T otally order commands via a sequence of consensus instances

  10. State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 c 2 , c 1 , c 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 Replicas compute the same sequence of results

  11. State machine replication c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 ✘ c 2 , c 1 , c 3 c 2 , c 1 , c 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 Replicas compute the same sequence of results

  12. State machine replication Correctness: replicated implementation is linearizable wrt single-server one: replication transparent to clients c 1 c 2 c 3 c 3 , c 2 , c 1 c 1 , c 2 , c 3 c 2 , c 1 , c 3 ✘ c 2 , c 1 , c 3 c 2 , c 1 , c 3 r 2 , r 1 , r 3 r 2 , r 1 , r 3 Replicas compute the same sequence of results

  13. The zoo of consensus protocols • Viewstamped • Mencius (2008) • Vertical Paxos (2009) replication (1988) • Paxos (1998) • ZAB (2009) • Disk Paxos (2003) • Ring Paxos (2010) • Cheap Paxos (2004) • Egalitarian Paxos • Generalized Paxos (2013) • Raft (2014) (2004) • Paxos Commit (2004) • M2Paxos (2016) • Fast Paxos (2006) • Flexible Paxos (2016) • Stoppable Paxos • Caesar (2017) (2008)

  14. The zoo of consensus Complex protocols: protocols constant fjght for better performance • Viewstamped • Mencius (2008) • Vertical Paxos (2009) replication (1988) • Paxos (1998) • ZAB (2009) • Disk Paxos (2003) • Ring Paxos (2010) • Cheap Paxos (2004) • Egalitarian Paxos • Generalized Paxos (2013) • Raft (2014) (2004) • Paxos Commit (2004) • M2Paxos (2016) • Fast Paxos (2006) • Flexible Paxos (2016) • Stoppable Paxos • Caesar (2017) (2008)

  15. The zoo of consensus Complex protocols: protocols constant fjght for better performance • Viewstamped • Mencius (2008) • Vertical Paxos (2009) replication (1988) • Paxos (1998) • ZAB (2009) • Disk Paxos (2003) • Ring Paxos (2010) • Cheap Paxos (2004) • Egalitarian Paxos • Generalized Paxos (2013) • Raft (2014) (2004) • Paxos Commit (2004) • M2Paxos (2016) • Fast Paxos (2006) • Flexible Paxos (2016) • Stoppable Paxos • Caesar (2017) (2008)

  16. Broken [Michael + 2016]

  17. Goals • Develop methods for proving protocols correct, including realistic deployments • Get insights into their structure

  18. Goals • Develop methods for proving protocols correct, including realistic deployments • Get insights into their structure • Focus on single-decree Paxos and Multi-Paxos

  19. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing

  20. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 2 P 1

  21. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 P 1

  22. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 P 1

  23. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 S 1 atomic { ... }

  24. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 P 2 P 2 (S 1 ) ⊑ S 2 S 1 atomic { ... }

  25. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 S 2 P 2 (S 1 ) ⊑ S 2 atomic { ... ... }

  26. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] P 3 P 1 ⊑ S 1 S 2 P 2 (S 1 ) ⊑ S 2 atomic { ... P 3 (S 2 ) ⊑ S 3 ... }

  27. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] S 3 P 1 ⊑ S 1 atomic { P 2 (S 1 ) ⊑ S 2 ... ... P 3 (S 2 ) ⊑ S 3 ... }

  28. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] •Transformations of the network semantics, à la Verifjed System Transformers of the Verdi framework [Wilcox + 2015]

  29. Approach • Modular reasoning: verify parts of the protocol separately instead of the whole thing •Linearizability implies refjnement [Filipovic + 2009] •Transformations of the network semantics, à la Verifjed System Transformers of the Verdi framework [Wilcox + 2015] Prove one variant of the protocol without unpacking the proof of a simpler variant

  30. v 1 v 2 v 3 1 2 3 Acceptor Acceptor Acceptor Acceptor Acceptor Acceptor Proposer Proposer • Acceptors = members of parliament: can vote to accept a value,majority wins • Proposer = parliament speaker: proposes its value to vote on

  31. 1 2 3 Round#: 0 Round#: 0 Round#: 0 Accepted: ? Accepted: ? Accepted: ? • Phase 1: a proposer choses a round r and convinces a majority of acceptors to switch to r • Acceptor switches only if it’s current round is less

  32. r 1 2 3 Round#: 0 Round#: r Round#: 0 Accepted: ? Accepted: ? Accepted: ? • Phase 1: a proposer choses a round r and convinces a majority of acceptors to switch to r • Acceptor switches only if it’s current round is less

  33. ok 1 2 3 Round#: r Round#: r Round#: 0 Accepted: ? Accepted: ? Accepted: ? • Phase 1: a proposer choses a round r and convinces a majority of acceptors to switch to r • Acceptor switches only if it’s current round is less

  34. r, v 2 1 2 3 Round#: r Round#: r Round#: 0 Accepted: ? Accepted: v 2 Accepted: ? • Phase 2: the proposer sends its value tagged with the round number • Acceptor only accepts a value tagged with the round it is in

  35. ok 1 2 3 Round#: r Round#: r Round#: 0 Accepted: v 2 Accepted: v 2 ✔ Accepted: ? Reply v 2 to client • Phase 2: the proposer sends its value tagged with the round number • Acceptor only accepts a value tagged with the round it is in

  36. rʹ 1 2 3 Round#: r Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: ? Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to switch to r’

  37. ok, r, v 2 1 2 3 Round#: rʹ Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: ? Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to switch to r’ • Acceptor sends to the proposer its round number and value

  38. ok, r, v 2 1 2 3 Round#: rʹ Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: v 2 Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to switch to r’ • Acceptor sends to the proposer its round number and value • If some acceptor has accepted a value, the proposer proposes the value with the highest round number

  39. ok, r, v 2 1 2 3 Round#: rʹ Round#: r Round#: rʹ Accepted: v 2 Accepted: v 2 ✔ Accepted: v 2 Reply v 2 to client • Phase 1: a proposer choses a round r’ and convinces a majority of acceptors to Ensures that the chosen switch to r’ value v 2 will not be changed • Acceptor sends to the proposer its round number and value • If some acceptor has accepted a value, the proposer proposes the value with the highest round number

  40. Modular structure in single-decree Paxos • Steal abstractions from an existing analysis of Paxos [Boichat + 2003] • Show their linearizability ➜ modular proof of Paxos

  41. Round Based Register [Boichat + 2003] • Data type Paxos encapsulating the RB Consensus state of acceptors RB Register • read(int k) Phase 1 of Paxos • write(int k, val v) Phase 2 of Paxos

Recommend


More recommend