Implementing Distributed Consensus Dan Lüdtke danrl@google.com Disclaimer This work is not affiliated with any company (including Google). This talk is the result of a personal education project!
What? ● My hobby project of learning about Distributed Consensus ○ I implemented a Paxos variant in Go and learned a lot about reaching consensus ○ A fine selection of some of the mistakes I made Why? ● I wanted to understand Distributed Consensus ○ Everyone seemed to understand it. Except me. ● I am a hands-on person. ○ Doing $stuff > Reading about $stuff Why talk about it? ● Sharing is caring!
Distributed Consensus
Protocols Implementations ● Paxos ● Chubby ○ Multi-Paxos ○ coarse grained lock service ○ Cheap Paxos ● etcd ● Raft ○ a distributed key value store ● ZooKeeper Atomic Broadcast ● Apache ZooKeeper ● Proof-of-Work Systems ○ a centralized service for ○ Bitcoin maintaining configuration ● Lockstep Anti-Cheating information, naming, providing ○ Age of Empires distributed synchronization Raft Logo: Attribution 3.0 Unported (CC BY 3.0) Source: https://raft.github.io/#implementations Etcd Logo: Apache 2 Source: https://github.com/etcd-io/etcd/blob/master/LICENSE Zookeeper Logo: Apache 2 Source: https://zookeeper.apache.org/
Paxos
Paxos Roles ● Client ○ Issues request to a proposer ○ Waits for response from a learner ■ Consensus on value X ■ No consensus on value X ● Proposer ● Acceptor ● Learner P ● Leader client Consensus on X?
Paxos Roles A ● Client ● Proposer (P) Proposing X... ○ Advocates a client request ○ Asks acceptors to agree on the proposed value A ○ Move the protocol forward when there is conflict ● Acceptor ● Learner P ● Leader Proposing X... client
Paxos Roles L A Yea ● Client Yea ● Proposer (P) Yea ● Acceptor (A) ○ Also called "voter" A ○ The fault-tolerant "memory" of the system Yea ○ Groups of acceptors form a quorum ● Learner P ● Leader client
Paxos Roles L A ● Client ● Proposer (P) ● Acceptor (A) Yea ● Learner (L) A ○ Adds replication to the protocol ○ Takes action on learned (agreed on) values ○ E.g. respond to client P ● Leader client
Paxos Roles L A ● Client ● Proposer (P) ● Acceptor (A) Yea ● Learner (L) A ● Leader (LD) ○ Distinguished proposer ○ The only proposer that can make P LD progress ○ Multiple proposers may believe to be leader ○ Acceptors decide which one gets a client 1 client 2 majority
Coalesced Roles ● A single processors can have P+ multiple roles ● P+ ○ Proposer P+ P+ ○ Acceptor ○ Learner ● Client talks to any processor ○ Nearest one? P+ P+ ○ Leader? client
Coalesced Roles at Scale ● P+ system is a complete digraph P+ ○ a directed graph in which every pair of distinct vertices is connected by a pair of unique edges P+ P+ ○ Everyone talks to everyone ● Let n be the number of processors ○ a.k.a. Quorum Size P+ P+ ● Connections = n * ( n - 1) ○ Potential network (TCP) connections client
Coalesced Roles with Leader ● P+ system with a leader is a directed P+ graph ○ Leader talks to everyone else P+ P+ ● Let n be the number of processors ○ a.k.a. Quorum Size ● Connections = n - 1 ○ Network (TCP) connections P+ P+ client
Coalesced Roles at Scale Maximum quorum size seen in “real life”
Limitations - Single consensus Creative Commons Attribution-Share Alike 4.0 International by Aswin Krishna Poyil - Once consensus has been reached no more progress can be made - But: Applications can start new Paxos runs - Multiple proposers may believe to be the leader - dueling proposers - theoretically infinite duel - practically retry-limits and jitter helps - Standard Paxos not resilient against Byzantine failures - Byzantine: Lying or compromised processors - Solution: Byzantine Paxos Protocol
Introducing Skinny ● Paxos-based ● Minimalistic ● Educational ● Lock Service The “Giraffe”, “Beaver”, “Alien”, and “Frame” graphics on the following slides have been released under Creative Commons Zero 1.0 Public Domain License
Skinny "Features" ● Designed to be easy to understand ● Relatively easy to observe ● Coalesced Roles ● Single Lock ○ Locks are always advisory! ○ A lock service does not enforce obedience to locks. ● Go ● Protocol Buffers ● gRPC ● Do not use in production!
Assuming a wide quorum ● Instances ○ Oregon (North America) ○ São Paulo (South America) ○ London (Europe) ○ Taiwan (Asia) ○ Sydney (Australia) ● Unusual in practice ○ "Terrible latency" ● Perfect for observation and learning ○ Timeouts, Deadlines, Latency
How Skinny reaches consensus
SKINNY QUORUM 4 3 5 Lock please? 2 1
ID 0 PHASE 1A: PROPOSE Promised 0 Holder 4 ID 0 ID 0 Promised 0 Promised 0 Holder Holder 3 5 Proposal ID 1 Proposal ID 1 Proposal ID 1 Lock please? Proposal ID 1 2 1 ID 0 ID 0 Promised 0 Promised 1 Holder Holder
ID 0 PHASE 1B: PROMISE Promised 1 Holder 4 ID 0 ID 0 Promise Promised 1 Promised 1 ID 1 Holder Holder 3 5 Promise ID 1 Promise ID 1 Promise ID 1 2 1 ID 0 ID 0 Promised 1 Promised 1 Holder Holder
ID 0 PHASE 2A: COMMIT Promised 1 Holder 4 ID 0 Commit ID 0 Promised 1 ID 1 Promised 1 Holder Holder Beaver Holder 3 5 Commit ID 1 Holder Beaver Commit ID 1 Holder Beaver Commit ID 1 2 1 Holder Beaver ID 0 ID 1 Promised 1 Promised 1 Holder Holder Beaver
ID 1 PHASE 2B: COMMITTED Promised 1 Holder Beaver 4 ID 1 ID 1 Committed Promised 1 Promised 1 Holder Beaver Holder Beaver 3 5 Committed Committed Lock acquired! Holder is Beaver. Committed 2 1 ID 1 ID 1 Promised 1 Promised 1 Holder Beaver Holder Beaver
How Skinny deals with Instance Failure
ID 9 SCENARIO Promised 9 Holder Beaver 4 ID 9 Promised 9 Holder Beaver ID 9 Promised 9 Holder Beaver 3 5 2 1 ID 9 ID 9 Promised 9 Promised 9 Holder Beaver Holder Beaver
ID 9 TWO INSTANCES FAIL Promised 9 Holder Beaver 4 ID 9 Promised 9 Holder Beaver ID 9 Promised 9 Holder Beaver 3 5 2 1 ID 9 ID 9 Promised 9 Promised 9 Holder Beaver Holder Beaver
ID 9 INSTANCES ARE BACK Promised 9 BUT STATE IS LOST Holder Beaver 4 ID 0 Promised 0 Holder ID 0 Promised 0 Holder Lock 3 5 please? 2 1 ID 9 ID 9 Promised 9 Promised 9 Holder Beaver Holder Beaver
ID 9 INSTANCES ARE BACK Promised 9 BUT STATE IS LOST Holder Beaver 4 ID 3 Promised 3 Proposal Holder ID 0 ID 3 Promised 0 Holder Lock 3 5 please? Proposal ID 3 Proposal Proposal ID 3 ID 3 2 1 ID 9 ID 9 Promised 9 Promised 9 Holder Beaver Holder Beaver
ID 9 PROPOSAL REJECTED Promised 9 Holder Beaver 4 ID 3 Promised 3 NOT Promised Holder ID 9 ID 0 Promised 3 Holder Beaver Holder 3 5 Promise ID 3 NOT Promised ID 9 NOT Promised Holder Beaver ID 9 Holder Beaver 2 1 ID 9 ID 9 Promised 9 Promised 9 Holder Beaver Holder Beaver
ID 9 START NEW PROPOSAL Promised 9 WITH LEARNED VALUES Holder Beaver 4 ID 9 Promised 12 Proposal Holder Beaver ID 0 ID 12 Promised 3 Holder 3 5 Proposal ID 12 Proposal Proposal ID 12 ID 12 2 1 ID 9 ID 9 Promised 9 Promised 9 Holder Beaver Holder Beaver
ID 9 PROPOSAL ACCEPTED Promised 12 Holder Beaver 4 ID 9 Promise Promised 12 ID 12 Holder Beaver ID 0 Promised 12 Holder 3 5 Promise ID 12 Promise ID 12 Promise ID 12 2 1 ID 9 ID 9 Promised 12 Promised 12 Holder Beaver Holder Beaver
ID 9 COMMIT LEARNED VALUE Promised 12 Holder Beaver 4 ID 12 Promised 12 Commit Holder Beaver ID 9 ID 12 Promised 12 Holder Beaver Holder 3 5 Commit ID 12 Holder Beaver Commit ID 12 Commit Holder Beaver ID 12 Holder Beaver 2 1 ID 9 ID 9 Promised 12 Promised 12 Holder Beaver Holder Beaver
ID 12 COMMIT ACCEPTED Promised 12 LOCK NOT GRANTED Holder Beaver 4 ID 12 Committed Promised 12 Holder Beaver ID 12 Promised 12 Holder Beaver 3 5 Committed Committed Committed Lock NOT acquired! 2 1 Holder is Beaver. ID 12 ID 12 Promised 12 Promised 12 Holder Beaver Holder Beaver
Skinny APIs
Consensus API Skinny APIs ● Lock API ○ Used by clients to acquire or release a lock ● Consensus API ○ Used by Skinny instances to reach Lock API consensus ● Control API Control API ○ Used by us to observe client admin what's happening
Lock API message AcquireRequest { message ReleaseRequest {} string Holder = 1; message ReleaseResponse { } bool Released = 1; message AcquireResponse { } bool Acquired = 1; string Holder = 2; } service Lock { rpc Acquire(AcquireRequest) returns (AcquireResponse); client rpc Release(ReleaseRequest) returns (ReleaseResponse); } admin
Recommend
More recommend