Planning for Change in a Formal Verification of the Raft Consensus Protocol Doug James Steve Zach Mike Tom Woos Wilcox Anton Tatlock Ernst Anderson
Contributions First formal proof of Raft’s safety first verified implementation! Large-scale Verdi case study stress test; reverification inevitable Proof engineering lessons affinity lemmas, etc.
Distributed Systems
Reliably deliver procrastination
Also serious infrastructure
One day last summer...
One day last summer...
One day last summer...
How distributed systems fail
Related Work EventML [LADA12, AVoCS15] language for verified distributed systems IronFleet [SOSP15] liveness, log compaction, serialization Verdi [PLDI15] network semantics, transformers, higher-order
Verdi background Network semantics operational semantics define network behavior Verified system transformers prove property transfer to adversarial network App App VST App App App App
Big Picture Past: Verdi Framework compositional fault tolerance Present: Verified Raft critical piece of infrastructure Future: dynamically upgrading systems program logic
Outline Verification Challenge ⇒ state machine replication Raft Algorithm implemented in Verdi Proof Overview and lessons learned
Replication for fault tolerance critical components must not fail
Replication for fault tolerance replicas must be consistent with each other ⇒ available if n/2 nodes are up
Replication for fault tolerance ⇒
Replication correctness ⇒
Replication correctness ≈ linearizability cluster presents consistent order of operations to clients
Internal Correctness ≈ linearizability follows from internal correctness: state machine safety
Goal: Verify Raft Prove State Machine Safety Reduce linearizability to State Machine Safety ⇒ [PLDI15]
Goal: Verify Raft 45k ⇒ LOC 5k Lin. SMS
Outline Verification Challenge ⇒ state machine replication Raft Algorithm implemented in Verdi Proof Overview and lessons learned
Formalizing the network state of the world packets in flight history of I/O data @ nodes
Formalizing the network
Formalizing the network
Defining network semantics Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) Deliver ( { ( src, dst, m ) } ] P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h o i )
Defining network semantics Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) Deliver ( { ( src, dst, m ) } ] P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h o i ) p 2 P Duplicate ( P, Σ , T ) ( P ] { p } , Σ , T ) Drop ( { p } ] P, Σ , T ) ( P, Σ , T ) Σ 0 = Σ [ n 7! σ 0 ] H tmt ( n, Σ [ n ]) = ( σ 0 , o, P 0 ) Timeout ( P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h tmt , o i )
Defining network semantics Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) Deliver ( { ( src, dst, m ) } ] P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h o i ) p 2 P Duplicate systems defined by handlers ( P, Σ , T ) ( P ] { p } , Σ , T ) Drop ( { p } ] P, Σ , T ) ( P, Σ , T ) Σ 0 = Σ [ n 7! σ 0 ] H tmt ( n, Σ [ n ]) = ( σ 0 , o, P 0 ) Timeout ( P, Σ , T ) ( P ] P 0 , Σ 0 , T ++ h tmt , o i )
Implementing Raft election replication ... Term 1 Term 2 Term 3
Implementing Raft: Leader Election Followers Vote ReqVote Candidate ... Term 1 Term 2 Term 3
Implementing Raft ... Term 1 Term 2 Term 3
Implementing Raft: Log Replication Followers AppendAck Append Leader commits entry when receiving n/2 acks Leader ... Term 1 Term 2 Term 3
Outline Verification Challenge ⇒ state machine replication Raft Algorithm implemented in Verdi Proof Overview and lessons learned
Verifying Raft: Show linearizability ≈
Verifying Raft: Approach ⇒
State Machine Safety Nodes agree about committed entries ⇒ since only committed entries executed proof by induction on an execution
State Machine Safety: Proof I ⇒ I not inductive!
State Machine Safety: Proof 90 invariants Lemma … in total Lemma Lemma I I true initially preserved I I ⇒
The burden of proof Lemma … Lemma Lemma P true initially P preserved Re-verification is the primary challenge: P with ghost state - invariants are not inductive - not-yet-verified code is wrong P ⇒ P - need additional invariants
The burden of proof Lemma … Re-verification is the primary challenge Lemma Lemma P true initially P preserved P with ghost state Proof engineering techniques help: - affinity lemmas P ⇒ P - intermediate reachability - structural tactics - information hiding
Ghost State: Example Capture all entries received by a node Log (real) allEntries (ghost) Follower A ,B,C A ,D {A,B,C,D} {A,D} Append [A],B,C Leader A ,B,C {A,B,C}
Affinity Lemmas: Example every invariant of entries in logs is Affinity Lemma e log ⇒ invariant of entries ∈ in allEntries e.term > 0 ⇒ e.term > 0 e allEntries ∈
Affinity Lemmas: Example every invariant of entries in logs is Affinity Lemma e log ⇒ invariant of entries ∈ in allEntries P e ⇒ P e e allEntries ∈
Affinity Lemmas Ex 1: Relate ghost state to real state transfer properties once and for all Ex 2: Relate current messages to past response => past request
Structured Handlers: Example handler = update_state ; respond net net update_state handler net i respond net’ net’
Structured Handlers: Example handler = update_state ; respond I net net update_state handler net i respond I net’ net’
Structured Handlers: Example handler = update_state ; respond I I net net update_state I handler net i respond I I net’ net’
The burden of proof Lemma … Re-verification is the primary challenge Lemma Lemma P true initially P preserved P with ghost state Proof engineering techniques help: - affinity lemmas P ⇒ P - intermediate reachability - structural tactics - information hiding
Contributions First formal proof of Raft’s safety first verified implementation! Large-scale Verdi case study stress test; reverification inevitable Proof engineering lessons affinity lemmas, etc.
Planning for Change in a Formal Verification of the Raft Consensus Protocol Doug James Steve Zach Michael Tom Woos Wilcox Anton Tatlock Ernst Anderson
Recommend
More recommend