Consensus… a classic problem Consensus, impossibility � Consensus abstraction underlies many results and Paxos distributed systems and protocols � N processes � They start execution with inputs ∈ { 0,1} Ken Birman � Asynchronous, reliable network � At most 1 process fails by halting (crash) � Goal: protocol whereby all “decide” same value v , and v was an input Distributed Consensus Asynchronous networks � No common clocks or shared notion of time (local ideas of time are fine, but different processes may have very different “clocks”) � No way to know how long a message will take to get from A to B � Messages are never lost in the network Jenkins, if I want another yes-man, I’ll build one! Lee Lorenz, Brent Sheppard Quick comparison… Fault-tolerant protocol � Collect votes from all N processes Asynchronous model Real world � At most one is faulty, so if one doesn’t Reliable message passing, Just resend until acknowledged; respond, count that vote as 0 unbounded delays often have a delay model � Compute majority No partitioning faults (“wait until May have to operate “during” over”) partitioning � Tell everyone the outcome No clocks of any kinds Clocks but limited sync � They “decide” (they accept outcome) Crash failures, can’t detect Usually detect failures with � … but this has a problem! Why? reliably timeout 1
What makes consensus hard? Fischer, Lynch and Patterson � Fundamentally, the issue revolves � A surprising result around membership � Impossibility of Asynchronous Distributed Consensus with a Single Faulty Process � In an asynchronous environment, we can’t � They prove that no asynchronous algorithm detect failures reliably for agreeing on a one-bit value can guarantee � A faulty process stops sending messages that it will terminate in the presence of crash but a “slow” message might confuse us faults � Yet when the vote is nearly a tie, this � And this is true even if no crash actually occurs! confusing situation really matters � Proof constructs infinite non-terminating runs Core of FLP result Self-Quiz questions � They start by looking at a system with � When is a state “univalent” as opposed inputs that are all the same to “bivalent”? � All 0’s must decide 0, all 1’s decides 1 � Can the system be in a univalent state � Now they explore mixtures of inputs if no process has actually decided? and find some initial set of inputs with � What “causes” a system to enter a an uncertain (“bivalent”) outcome univalent state? � They focus on this bivalent state Self-Quiz questions Bivalent state S * denotes bivalent state S 0 denotes a decision 0 state � Suppose that event e moves us into a S 1 denotes a decision 1 state univalent state, and e happens at p . System starts in S * � Might p decide “immediately? � Now sever communications from p to the rest of the system. Both event e and p ’s decision are “hidden” Events can Events can take it to take it to � Does this matter in the FLP model? state S 0 state S 1 � Might it matter in real life? Sooner or later all executions Sooner or later all executions decide 0 decide 1 2
Bivalent state Bivalent state e is a critical event that They delay e and show System System takes us from a bivalent that there is a situation in to a univalent state: starts in S * which the system will starts in S * eventually we’ll “decide” 0 return to a bivalent state e Events can Events can Events can Events can take it to take it to take it to take it to state S 0 state S 1 state S 0 state S 1 S ’ * Bivalent state Bivalent state System System starts in S * starts in S * Notice that we made the I n this new state they system do some work and show that we can deliver yet it ended up back in an e and that now, the new “uncertain” state. We can Events can Events can Events can Events can state will still be bivalent! do this again and again take it to take it to take it to take it to state S 0 state S 1 state S 0 state S 1 S ’ S ’ * * e e S ’’ S ’’ * * Core of FLP result in words Core of FLP result � In an initially bivalent state, they look at � Initially in a bivalent state some execution that would lead to a � Delivery of m would make us univalent but decision state, say “0” we delay m � They show that if the protocol is fault -tolerant � At some step this run switches from there must be a run that leads to the other bivalent to univalent, when some process univalent state receives some message m � And they show that you can deliver m in this � They now explore executions in which m is run without a decision being made delayed 3
Core of FLP result Intuition behind this result? � This proves the result: a bivalent � Think of a real system trying to agree on something in which process p plays a key role system can be forced to do some work and yet remain in a bivalent state. � But the system is fault -tolerant: if p crashes it adapts and moves on � We can “pump” this to generate indefinite � Their proof “tricks” the system into treating p runs that never decide as if it had failed, but then lets p resume � Interesting insight: no failures actually execution and “rejoin” occur (just delays). FLP attacks a fault - � This takes time… and no real progress occurs tolerant protocol using fault -free runs! But what did “impossibility” mean? But what did “impossibility” mean? � FLP proves that any fault-tolerant � In formal proofs, an algorithm is totally algorithm solving consensus has runs correct if that never terminate � It computes the right thing � These runs are extremely unlikely � And it always terminates (“probability zero”) � When we say something is possible, we � Yet they imply that we can’t find a totally mean “there is a totally correct correct solution algorithm” solving the problem � “consensus is impossible” thus means “consensus is not always possible” Solving consensus GMS in a large system Global events � Systems that “solve” consensus often use a Output is the official are inputs to record of events that membership service the GMS mattered to the system � This GMS functions as an oracle, a trusted status reporting function GMS � Then consensus protocol involves a kind of 2- phase protocol that runs over the output of the GMS � It is known precisely when such a solution will be able to make progress 4
Paxos Algorithm Paxos “proposal” � Node proposes to append some � Distributed consensus algorithm information to a replicated history � Doesn’t use a GMS… at least in basic version… but isn’t very efficient either � Proposal could be a decision value, � Guarantees safety, but not liveness. hence can solve consensus � Key Assumptions: � Or could be some other information, � Set of processes that run Paxos is known a-priori � Processes suffer crash failures such as “Frank’s new salary” or � All processes have Greek names (but translate as “Position of Air France flight 21” “Fred”, “Cynthia”, “Nancy”…) Paxos Algorithm Paxos Algorithm � 3 roles � Proposals are associated with a version � proposer number . � acceptor � Processors vote on each proposal. A proposal � Learner approved by a majority will get passed. � Size of majority is “well known” because potential membership of system was known a-priori � 2 phases � A process considering two proposals approves the � Phase 1: prepare request �� Response one with the larger version number. � Phase 2: Accept request �� Response Phase 1: (prepare request) Phase 1: (prepare request) (1) A proposer chooses a new proposal (2) If an acceptor receives a prepare request version number n , and sends a prepare (“prepare”, n) with n greater than that of request (“prepare”,n) to a majority of any prepare request it has already acceptors: responded, sends out (“ack”, n, n’, v’) or (“ack”, n, ⊥ , ⊥ ) (a) Can I make a proposal with number n ? (b) if yes, do you suggest some value for my (a) responds with a promises not to accept any proposal? more proposals numbered less than n. (b) suggest the value v of the highest-number proposal that it has accepted if any, else ⊥ 5
Recommend
More recommend