byzan ne fault tolerance
play

Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 - PowerPoint PPT Presentation

Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 Material drived from slides by I. Gupta and N.Vaidya 1 Reading List L. Lamport, R. Shostak, M. Pease, The Byzan&ne Generals Problem, ACM ToPLaS 1982. M.


  1. Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 Material drived from slides by I. Gupta and N.Vaidya 1

  2. Reading List • L. Lamport, R. Shostak, M. Pease, “The Byzan&ne Generals Problem,” ACM ToPLaS 1982. • M. Castro and B. Liskov, “Prac&cal Byzan&ne Fault Tolerance,” OSDI 1999. 2

  3. Byzan&ne Generals Problem A sender wants to send message to n‐1 other peers • Fault‐free nodes must agree • Sender fault‐free  agree on its message • Up to f failures

  4. Byzan&ne Generals Problem A sender wants to send message to n‐1 other peers • Fault‐free nodes must agree • Sender fault‐free  agree on its message • Up to f failures

  5. Byzan&ne Generals Algorithm value v S v v v 3 Faulty peer 1 2 5

  6. Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 6

  7. Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 ? ? 7

  8. Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 v ? ? v 8

  9. Byzan&ne Generals Algorithm value v S v v v v v [v,v,?] 3 1 2 v ? [v,v,?] ? x 9

  10. Byzan&ne Generals Algorithm value v S v v v v v v 3 1 2 Majority v ? vote results v in correct result at ? good peers x 10

  11. Byzan&ne Generals Algorithm S Faulty source v x w 3 1 2 11

  12. Byzan&ne Generals Algorithm S v x w w w 3 1 2 12

  13. Byzan&ne Generals Algorithm S v x w w w 3 1 2 x v v x 13

  14. Byzan&ne Generals Algorithm S v x w w w [v,w,x] 3 [v,w,x] 1 2 x v [v,w,x] v x 14

  15. Byzan&ne Generals Algorithm S v x w w w [v,w,x] 3 [v,w,x] 1 2 x v [v,w,x] Vote result v iden&cal at good peers x 15

  16. Known Results • Need 3f + 1 nodes to tolerate f failures • Need Ω(n 2 ) messages in general 16

  17. Ω(n 2 ) Message Complexity • Each message at least 1 bit • Ω(n 2 ) bits “ communica&on complexity ” to agree on just 1 bit value 17

  18. Prac&cal Byzan&ne Fault Tolerance • Computer systems provide crucial services • Computer systems fail – Crash‐stop failure – Crash‐recovery failure – Byzan&ne failure • Example: natural disaster, malicious afack, hardware failure, sogware bug, etc. • Need highly available service Replicate to increase availability 18

  19. Challenges Request A Request B Client Client 19

  20. Requirements • All replicas must handle same requests despite failure. • Replicas must handle requests in iden&cal order despite failure. 20

  21. Challenges Client Client 1: Request A 2: Request B 21

  22. State Machine Replica&on Client Client How to assign sequence number to requests? 1: Request A 1: Request A 1: Request A 1: Request A 2: Request B 2: Request B 2: Request B 2: Request B 22

  23. Primary Backup Mechanism Client Client What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) 1: Request A 2: Request B View 0 23

  24. Normal Case Opera&on • Three phase algorithm: – PRE‐PREPARE picks order of requests – PREPARE ensures order within views – COMMIT ensures order across views • Replicas remember messages in log • Messages are authen&cated – {.} σk denotes a message sent by k 24

  25. Pre‐prepare Phase Request: m {PRE‐PREPARE, v, n, m} σ0 Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 25

  26. Prepare Phase Request: m PRE‐PREPARE Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 26

  27. Prepare Phase Request: m PRE‐PREPARE Primary: Replica 0 {PREPARE, v, n, D(m), 1} σ1 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 27

  28. Prepare Phase Request: m Collect PRE‐PREPARE + 2f matching PREPARE PRE‐PREPARE Primary: Replica 0 {PREPARE, v, n, D(m), 1} σ1 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 28

  29. Commit Phase Request: m PRE‐PREPARE PREPARE Primary: Replica 0 Replica 1 {COMMIT, v, n, D(m)} σ2 Replica 2 Fail Replica 3 29

  30. Commit Phase (2) Request: m Collect 2f+1 matching COMMIT: execute and reply PRE‐PREPARE PREPARE COMMIT Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 30

  31. View Change • Provide liveness when primary fails – Timeouts trigger view changes – Select new primary (= view number mod 3f+1) • Brief protocol – Replicas send VIEW‐CHANGE message along with the requests they prepared so far – New primary collects 2f+1 VIEW‐CHANGE messages – Constructs informa&on about commifed requests in previous views 31

  32. View Change Safety • Goal: No two different commifed request with same sequence number across views Quorum for Commifed View Change Cer&ficate (m, v, n) Quorum At least one correct replica has Prepared Cer&ficate (m, v, n) 32

  33. Related Works Fault Tolerance Fail Stop Fault Tolerance Byzan&ne Fault Tolerance Paxos Byzan&ne Byzan&ne Hybrid 1989 (TR) Agreement Quorums Quorum VS Replica&on Rampart Malkhi‐Reiter HQ Replica&on PODC 1988 TPDS 1995 JDC 1998 OSDI ‘06 SecureRing Phalanx HICSS 1998 SRDS 1998 PBFT Fleet OSDI ‘99 ToKDI ‘00 BASE Q/U TOCS ‘03 SOSP ‘05 33

Recommend


More recommend