Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 Material drived from slides by I. Gupta and N.Vaidya 1
Reading List • L. Lamport, R. Shostak, M. Pease, “The Byzan&ne Generals Problem,” ACM ToPLaS 1982. • M. Castro and B. Liskov, “Prac&cal Byzan&ne Fault Tolerance,” OSDI 1999. 2
Byzan&ne Generals Problem A sender wants to send message to n‐1 other peers • Fault‐free nodes must agree • Sender fault‐free agree on its message • Up to f failures
Byzan&ne Generals Problem A sender wants to send message to n‐1 other peers • Fault‐free nodes must agree • Sender fault‐free agree on its message • Up to f failures
Byzan&ne Generals Algorithm value v S v v v 3 Faulty peer 1 2 5
Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 6
Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 ? ? 7
Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 v ? ? v 8
Byzan&ne Generals Algorithm value v S v v v v v [v,v,?] 3 1 2 v ? [v,v,?] ? x 9
Byzan&ne Generals Algorithm value v S v v v v v v 3 1 2 Majority v ? vote results v in correct result at ? good peers x 10
Byzan&ne Generals Algorithm S Faulty source v x w 3 1 2 11
Byzan&ne Generals Algorithm S v x w w w 3 1 2 12
Byzan&ne Generals Algorithm S v x w w w 3 1 2 x v v x 13
Byzan&ne Generals Algorithm S v x w w w [v,w,x] 3 [v,w,x] 1 2 x v [v,w,x] v x 14
Byzan&ne Generals Algorithm S v x w w w [v,w,x] 3 [v,w,x] 1 2 x v [v,w,x] Vote result v iden&cal at good peers x 15
Known Results • Need 3f + 1 nodes to tolerate f failures • Need Ω(n 2 ) messages in general 16
Ω(n 2 ) Message Complexity • Each message at least 1 bit • Ω(n 2 ) bits “ communica&on complexity ” to agree on just 1 bit value 17
Prac&cal Byzan&ne Fault Tolerance • Computer systems provide crucial services • Computer systems fail – Crash‐stop failure – Crash‐recovery failure – Byzan&ne failure • Example: natural disaster, malicious afack, hardware failure, sogware bug, etc. • Need highly available service Replicate to increase availability 18
Challenges Request A Request B Client Client 19
Requirements • All replicas must handle same requests despite failure. • Replicas must handle requests in iden&cal order despite failure. 20
Challenges Client Client 1: Request A 2: Request B 21
State Machine Replica&on Client Client How to assign sequence number to requests? 1: Request A 1: Request A 1: Request A 1: Request A 2: Request B 2: Request B 2: Request B 2: Request B 22
Primary Backup Mechanism Client Client What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) 1: Request A 2: Request B View 0 23
Normal Case Opera&on • Three phase algorithm: – PRE‐PREPARE picks order of requests – PREPARE ensures order within views – COMMIT ensures order across views • Replicas remember messages in log • Messages are authen&cated – {.} σk denotes a message sent by k 24
Pre‐prepare Phase Request: m {PRE‐PREPARE, v, n, m} σ0 Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 25
Prepare Phase Request: m PRE‐PREPARE Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 26
Prepare Phase Request: m PRE‐PREPARE Primary: Replica 0 {PREPARE, v, n, D(m), 1} σ1 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 27
Prepare Phase Request: m Collect PRE‐PREPARE + 2f matching PREPARE PRE‐PREPARE Primary: Replica 0 {PREPARE, v, n, D(m), 1} σ1 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 28
Commit Phase Request: m PRE‐PREPARE PREPARE Primary: Replica 0 Replica 1 {COMMIT, v, n, D(m)} σ2 Replica 2 Fail Replica 3 29
Commit Phase (2) Request: m Collect 2f+1 matching COMMIT: execute and reply PRE‐PREPARE PREPARE COMMIT Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 30
View Change • Provide liveness when primary fails – Timeouts trigger view changes – Select new primary (= view number mod 3f+1) • Brief protocol – Replicas send VIEW‐CHANGE message along with the requests they prepared so far – New primary collects 2f+1 VIEW‐CHANGE messages – Constructs informa&on about commifed requests in previous views 31
View Change Safety • Goal: No two different commifed request with same sequence number across views Quorum for Commifed View Change Cer&ficate (m, v, n) Quorum At least one correct replica has Prepared Cer&ficate (m, v, n) 32
Related Works Fault Tolerance Fail Stop Fault Tolerance Byzan&ne Fault Tolerance Paxos Byzan&ne Byzan&ne Hybrid 1989 (TR) Agreement Quorums Quorum VS Replica&on Rampart Malkhi‐Reiter HQ Replica&on PODC 1988 TPDS 1995 JDC 1998 OSDI ‘06 SecureRing Phalanx HICSS 1998 SRDS 1998 PBFT Fleet OSDI ‘99 ToKDI ‘00 BASE Q/U TOCS ‘03 SOSP ‘05 33
Recommend
More recommend