Group Communication Shan-Hung Wu and DataLab CS, NTHU
Outline • Group Communication • Basic Abstraction – Perfect Point to Point Link – Perfect Failure Detection • Reliable Broadcast – Best Effort Broadcast – Reliable Broadcast – Uniform Reliable Broadcast • Consensus – Regular Consensus – Total Order Broadcast • Paxos – Basic Paxos – Zab – Other Variants: Multi-Paxos, FastPaxos, and Generalized Paxos 2
Outline • Group Communication • Basic Abstraction – Perfect Point to Point Link – Perfect Failure Detection • Reliable Broadcast – Best Effort Broadcast – Reliable Broadcast – Uniform Reliable Broadcast • Consensus – Regular Consensus – Total Order Broadcast • Paxos – Basic Paxos – Zab – Other Variants: Multi-Paxos, FastPaxos, and Generalized Paxos 3
Group Communication • Group Communication is to provide multipoint to multipoint communication – Guarantees certain properties 4
Difficulties in Group Communication • Challenges – Message delay or loss – Out of order – Node Failure – Link Failure • Actually it is difficult to recognize whether the node or the link fails 5
Outline • Group Communication • Basic Abstraction – Perfect Point to Point Link – Perfect Failure Detection • Reliable Broadcast – Best Effort Broadcast – Reliable Broadcast – Uniform Reliable Broadcast • Consensus – Regular Consensus – Total Order Broadcast • Paxos – Basic Paxos – Zab – Other Variants: Multi-Paxos, FastPaxos, and Generalized Paxos 6
Perfect Point to Point Link • How to cope with message loss? – Message retransmission and eliminating duplicates 7
Message to be sent Message to be sent p 1 p 1 p 2 p 2 Message loss 8
Perfect Point to Point Link • Properties – Reliable delivery : if neither the sender nor the receiver crashes, then the receiver eventually delivers a message sent by the sender • Keep retransmitting the message until an ACK is received – No duplication : a receiver may receive a message many times, but can only deliver it once • Sequence number – No creation : if a message is delivered, it must be sent by some process • Checksum 9
Perfect Point to Point Link • A simplified implementation without ACKs Retransmit all messages periodically 10
Perfect Failure Detection • How to detect a node failure? – Detect timeout for heartbeats – If not receiving a heartbeat from a process p for a long time, then deem p has crashed 11
Perfect Failure Detection • Uses: – PerfectPointToPointLink • Properties – Strong completeness : eventually every correct process knows which processes are still alive. • Achieved by broadcasting which nodes are failed, or everyone can detect by themselves – Strong accuracy : if a process p is detected by any process, then p has crashed • A process is detected as failure iff it has crashed 12
Perfect Failure Detection Send heartbeat messages to all processes 13
Outline • Group Communication • Basic Abstraction – Perfect Point to Point Link – Perfect Failure Detection • Reliable Broadcast – Best Effort Broadcast – Reliable Broadcast – Uniform Reliable Broadcast • Consensus – Regular Consensus – Total Order Broadcast • Paxos – Basic Paxos – Zab – Other Variants: Multi-Paxos, FastPaxos, and Generalized Paxos 14
Broadcast • A broadcast abstraction enables a process to send a message to all processes in a system, including itself • A naïve approach • Try to broadcast the message to as many nodes as possible 15
Best Effort Broadcast p 1 p 2 p 3 p 4 16
Best Effort Broadcast • Uses: – PerfectPointToPointLink – PerfectFailureDetection • Properties – Best-effort validity • For any two processes p i and p j . If p i and p j are both correct, then every message broadcast by p i is eventually delivered by p j – No duplication – No creation 17
Best Effort Broadcast • How to achieve best effort broadcast ? – For the first property, the sender uses PerfectPointToPointLink to send the message to all receivers that hasn’t been detected as failure by PerfectFailureDetection – The other two properties are covered by PerfectPointToPointLink 18
Best Effort Broadcast 19
Is This Reliable? • Is best effort broadcast enough to have every correct processes receive the message ? – No. If the sender fails , rest correct processes may not deliver the message 20
Reliable Broadcast • Reliable broadcast ensures all correct processes deliver the same messages even if the sender fails • How? – If the sender is detected to have crashed, other processes will relay the message to all 21
Reliable Broadcast Detected p 1 Crash p 2 p 3 p 4 Relay 22
Reliable Broadcast • Uses: – BestEffortBroadcast – PerfectFailureDetection • Properties – Validity • If a correct process p i broadcasts a message m , then p i eventually delivers m. – No duplication – No creation – Agreement • If a message m is delivered by some correct processes p i , then m is eventually delivered by every correct process p j . 23
Reliable Broadcast Log the broadcast message Relay all broadcast messages coming from the failed process 24
Reliable Broadcast Meets Database • Can be used for GC-based eager replication? – To broadcast the effects of committed txs • Problems: – A process may deliver the messages too early – If this process crashes, other processes may not see the messages • Fails to ensure durability in DB world – Some committed txs are not propagated 25
Uniform Reliable Broadcast • Ensure the failed nodes do not deliver some other messages that others do not know • A process can only deliver the message when it knows all the other correct processes have received the message and returned an ack 26
Uniform Reliable Broadcast p 1 p 2 p 3 p 4 27
Uniform Reliable Broadcast • Uses: – BestEffortBroadcast – PerfectFailureDetection • Properties – Validity – No duplication – No creation – Uniform agreement • If a message m is delivered by some processes p i ( whether correct or faulty ), then m is also eventually delivered by every correct process p j 28
Uniform Reliable Broadcast Deliver the message only if it received ACKs from all correct processes 29
Outline • Group Communication • Basic Abstraction – Perfect Point to Point Link – Perfect Failure Detection • Reliable Broadcast – Best Effort Broadcast – Reliable Broadcast – Uniform Reliable Broadcast • Consensus – Regular Consensus – Total Order Broadcast • Paxos – Basic Paxos – Zab – Other Variants: Multi-Paxos, FastPaxos, and Generalized Paxos 30
Consensus • Consensus: all participants want to decide a value • Specified in terms of two primitives: propose and decide – Each process has an initial value that it proposes for the agreement , through the primitive propose 31
Consensus • Uses: – BestEffortBroadcast – PerfectFailureDetection • Properties – Termination • Every correct process eventually decides some value. – Validity • If a process decides v , then v was proposed by some process. – Integrity • No process decides twice. – Agreement • No two correct process decide differently. 32
How? 33
Flooding Consensus • A consensus instance requires two rounds: – Round 1 • Every process proposes a value and broadcast to others • A consensus decision is reached when a process knows it has seen all proposed values that will be considered by correct processes for possible decision • The decision is made in a deterministic function • It’s ok to have many processes make the decision since the decisions should be all the same – Round 2 • The process that made the decision broadcasts the decision to all 34
Flooding Consensus Can decide upon arrival of all proposals of processes in Propose(2) current view p 1 Decide(2 = min(2, 3, 5, 7)) Propose(3) p 2 Propose(5) Decide(2) (3, 5, 7) p 3 Decide(2) Propose(7) (3, 5, 7) p 4 Cannot decide, starts another round Crash detected 35
Flooding Consensus Arrival of all proposals of processes in current view Relay the decision 36
Any Alternative? • Processes could fail during Round 1 and 2 • Why not using reliable broadcast? – All correct processes should receive all the proposals! – Every process decides (deterministically) the same – No need for round 2 any more! • However, if any process fails, the rest need to relay the proposals • Why not just relay decision? – This is exactly the purpose of the round 2! 37
Performance of Flooding Consensus • Regular: 2 steps • Each failure causes the start of a new round • Best case (no failures) – Single communication step in round 1 • Worst case (failure in every step) – N (the amount of processes) steps at most • Each step requires O(N 2 ) messages to be exchanged 38
Is This Enough for a Deterministic Database System? 39
Total Order Broadcast • Total order broadcast is a reliable broadcast communication abstraction which ensures that all processes deliver messages in the same order 40
Recommend
More recommend