planning for change in a formal verification of the raft
play

Planning for Change in a Formal Verification of the Raft Consensus - PowerPoint PPT Presentation

Planning for Change in a Formal Verification of the Raft Consensus Protocol Doug James Steve Zach Mike Tom Woos Wilcox Anton Tatlock Ernst Anderson Contributions First formal proof of Rafts safety first verified


  1. Planning for Change 
 in a Formal Verification of the 
 Raft Consensus Protocol Doug James Steve Zach Mike Tom Woos Wilcox Anton Tatlock Ernst Anderson

  2. Contributions First formal proof of Raft’s safety first verified implementation! Large-scale Verdi case study stress test; reverification inevitable Proof engineering lessons affinity lemmas, etc.

  3. Distributed Systems

  4. Reliably deliver procrastination

  5. Also serious infrastructure

  6. One day last summer...

  7. One day last summer...

  8. One day last summer...

  9. How distributed systems fail

  10. Related Work EventML [LADA12, AVoCS15] language for verified distributed systems IronFleet [SOSP15] liveness, log compaction, serialization Verdi [PLDI15] network semantics, transformers, higher-order

  11. Verdi background Network semantics operational semantics define network behavior Verified system transformers prove property transfer to adversarial network App App VST App App App App

  12. Big Picture Past: Verdi Framework compositional fault tolerance Present: Verified Raft critical piece of infrastructure Future: dynamically upgrading systems program logic

  13. Outline Verification Challenge ⇒ state machine replication Raft Algorithm implemented in Verdi Proof Overview and lessons learned

  14. Replication for fault tolerance critical components must not fail

  15. Replication for fault tolerance replicas must be consistent with each other ⇒ available if n/2 nodes are up

  16. Replication for fault tolerance ⇒

  17. Replication correctness ⇒

  18. Replication correctness ≈ linearizability cluster presents consistent order of operations to clients

  19. Internal Correctness ≈ linearizability follows from internal correctness: state machine safety

  20. Goal: Verify Raft Prove State Machine Safety Reduce linearizability to State Machine Safety ⇒ [PLDI15]

  21. Goal: Verify Raft 45k ⇒ LOC 5k Lin. SMS

  22. Outline Verification Challenge ⇒ state machine replication Raft Algorithm implemented in Verdi Proof Overview and lessons learned

  23. Formalizing the network state of the world packets in flight history of I/O data @ nodes

  24. Formalizing the network

  25. Formalizing the network

  26. Defining network semantics Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) Deliver ( { ( src, dst, m ) } ] P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h o i )

  27. Defining network semantics Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) Deliver ( { ( src, dst, m ) } ] P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h o i ) p 2 P Duplicate ( P, Σ , T ) ( P ] { p } , Σ , T ) Drop ( { p } ] P, Σ , T ) ( P, Σ , T ) Σ 0 = Σ [ n 7! σ 0 ] H tmt ( n, Σ [ n ]) = ( σ 0 , o, P 0 ) Timeout ( P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h tmt , o i )

  28. Defining network semantics Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) Deliver ( { ( src, dst, m ) } ] P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h o i ) p 2 P Duplicate systems defined by handlers ( P, Σ , T ) ( P ] { p } , Σ , T ) Drop ( { p } ] P, Σ , T ) ( P, Σ , T ) Σ 0 = Σ [ n 7! σ 0 ] H tmt ( n, Σ [ n ]) = ( σ 0 , o, P 0 ) Timeout ( P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h tmt , o i )

  29. Implementing Raft election replication ... Term 1 Term 2 Term 3

  30. Implementing Raft: Leader Election Followers Vote ReqVote Candidate ... Term 1 Term 2 Term 3

  31. Implementing Raft ... Term 1 Term 2 Term 3

  32. Implementing Raft: Log Replication Followers AppendAck Append Leader commits entry when receiving n/2 acks Leader ... Term 1 Term 2 Term 3

  33. Outline Verification Challenge ⇒ state machine replication Raft Algorithm implemented in Verdi Proof Overview and lessons learned

  34. Verifying Raft: Show linearizability ≈

  35. Verifying Raft: Approach ⇒

  36. State Machine Safety Nodes agree about committed entries ⇒ since only committed entries executed proof by induction on an execution

  37. State Machine Safety: Proof I ⇒ I not inductive!

  38. State Machine Safety: Proof 90 invariants Lemma … in total Lemma Lemma I I true initially preserved I I ⇒

  39. The burden of proof Lemma … Lemma Lemma P true initially P preserved Re-verification is the primary challenge: P with ghost state - invariants are not inductive - not-yet-verified code is wrong P ⇒ P - need additional invariants

  40. The burden of proof Lemma … Re-verification is the primary challenge Lemma Lemma P true initially P preserved P with ghost state Proof engineering techniques help: - affinity lemmas P ⇒ P - intermediate reachability - structural tactics - information hiding

  41. Ghost State: Example Capture all entries received by a node Log (real) allEntries (ghost) Follower A ,B,C A ,D {A,B,C,D} {A,D} Append [A],B,C Leader A ,B,C {A,B,C}

  42. Affinity Lemmas: Example every invariant of entries in logs is Affinity Lemma e log ⇒ invariant of entries ∈ in allEntries e.term > 0 ⇒ e.term > 0 e allEntries ∈

  43. Affinity Lemmas: Example every invariant of entries in logs is Affinity Lemma e log ⇒ invariant of entries ∈ in allEntries P e ⇒ P e e allEntries ∈

  44. Affinity Lemmas Ex 1: Relate ghost state to real state transfer properties once and for all Ex 2: Relate current messages to past response => past request

  45. Structured Handlers: Example handler = update_state ; respond net net update_state handler net i respond net’ net’

  46. Structured Handlers: Example handler = update_state ; respond I net net update_state handler net i respond I net’ net’

  47. Structured Handlers: Example handler = update_state ; respond I I net net update_state I handler net i respond I I net’ net’

  48. The burden of proof Lemma … Re-verification is the primary challenge Lemma Lemma P true initially P preserved P with ghost state Proof engineering techniques help: - affinity lemmas P ⇒ P - intermediate reachability - structural tactics - information hiding

  49. Contributions First formal proof of Raft’s safety first verified implementation! Large-scale Verdi case study stress test; reverification inevitable Proof engineering lessons affinity lemmas, etc.

  50. Planning for Change 
 in a Formal Verification of the 
 Raft Consensus Protocol Doug James Steve Zach Michael Tom Woos Wilcox Anton Tatlock Ernst Anderson

Recommend


More recommend