Verdi: A Framework for Implementing and Formally Verifying Distributed Systems ✓ ✓ Key-value VST store James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson
Challenges Distributed systems run in unreliable environments Many types of failure can occur Fault-tolerance mechanisms are challenging to implement correctly
Challenges Contributions Distributed systems run in Formalize network as unreliable environments operational semantics Many types of failure can occur Build semantics for a variety of fault models Fault-tolerance mechanisms Verify fault-tolerance as are challenging to transformation between implement correctly semantics
Verdi Workflow ✓ Build, verify system Key-value Client store in simple semantics I/O V ✓ S Apply verified system transformer T ✓ KV Consensus End-to-end correctness KV Client by composition I/O Consensus KV Consensus
Contributions General Approach Find environments Formalize network as in your problem domain operational semantics Formalize these environments Build semantics for as operational semantics a variety of fault models Verify fault-tolerance as Verify layers as transformations between transformation between semantics semantics
Verdi Successes Applications Key-value store Lock service Fault-tolerance mechanisms Sequence numbering Retransmission Primary-backup replication Consensus-based replication linearizability
Replicated KV store Important data Replicated Replicated KV store KV store Replicated for availability
Crash Replicated Reorder KV store Drop Duplicate Partition ... Replicated Replicated KV store KV store Environment is unreliable
Crash Replicated Reorder KV store Drop Duplicate Partition ... Replicated Replicated KV store KV store Decades of research; still difficult to implement correctly Implementations often have bugs
Bug-free Implementations Several inspiring successes in formal verification CompCert, seL4, Jitk, Bedrock, IronClad, Frenetic, Quark Goal: formally verify distributed system implementations ✓
Formally Verify Distributed Implementations Separate independent system components
Formally Verify Distributed Implementations App Fault tolerance App App Fault Fault tolerance tolerance Separate independent system components Verify application logic independently from fault-tolerance application logic fault tolerance
Formally Verify Distributed Implementations KV 1. Verify application logic Consensus 2. Verify fault tolerance mechanism KV KV 3. Run the system! Consensus Consensus Separate independent system components Verify application logic independently from consensus key-value store consensus
1. Verify Application Logic ✓ Simple model, Key-value Client store prove “good map” I/O
2. Verify Fault Tolerance Mechanism ✓ Simple model, Key-value Client store prove “good map” I/O V ✓ S Apply verified system transformer, prove “properties preserved” T ✓ KV Consensus End-to-end correctness KV Client by composition I/O Consensus KV Consensus
3. Run the System! KV Consensus KV Consensus KV Consensus Extract to OCaml, link unverified shim Run on real networks
Verifying application logic
Simple One-node Model Key-value Set “k” “v" Resp “k” “v” State: State: {} {“k”: “v”} Trace: [Set “k” “v", Resp “k” “v”]
Simple One-node Model System Output: o Input: 풊 State: σ State: σ ’ Trace: [ 풊 , o ] H inp ( σ , i ) = ( σ 0 , o ) Input ( σ , T ) s ( σ 0 , T ++ h i, o i )
Simple One-node Model Spec: operations have expected behavior (good map) Set, Get Del, Get Verify system against semantics by induction Safety Property
Verifying Fault Tolerance
The Raft Transformer Log of operations Consensus provides a Original system replicated state machine Raft Same inputs on each node Calls into original system Raft Raft
The Raft Transformer When input received: Add to log Raft Send to other nodes When op replicated: Raft Raft Apply to state machine Send output
The Raft Transformer For KV store: Ops are Get, Set, Del Raft State is dictionary Raft Raft
Raft Correctness V ✓ S T Correctly transforms systems Preserves traces Raft Linearizability Raft Raft
Fault Model Model global state Model internal communication Model failure
Fault Model: Global State Machines have names 1 Σ maps name to state Σ [1] 2 3 Σ [2] Σ [3]
Fault Model: Messages 1 Network Σ [1] Vote? Vote? <1,2,”Vote?”> <1,3,”Vote?”> 2 3 Σ [3] Σ [2] <2,1,”+1”> Σ ’[2] = σ ’ Output: o Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) ( { ( src, dst, m ) } ] P, Σ , T ) r ( P ] P 0 , Σ 0 , T ++ h o i )
Fault Model: Failures Network <1,2,”Vote?”> <1,3,”Vote?”> <1,3,”Vote?”> Message drop 1 Message duplication Σ [1] Machine crash 2 3 Σ [2] Σ [3]
Fault Model: Drop Network <1,2,”hi”> <1,3,”hi”> Drop ( { p } ] P, Σ , T ) drop ( P, Σ , T )
Toward Verifying Raft General theory of linearizability 1k lines of implementation, 5k lines for linearizability State machine safety: 30k lines Most state invariants proved, some left to do
Verified System Transformers Functions on systems Transform systems between semantics Maintain equivalent traces Get correctness of transformed system for free
Verified System Transformers Raft Primary Seq # and Ghost Consensus Backup Retrans Variables App App
Running Verdi Programs
Running Verdi Programs Coq extraction to Ocaml Thin, unverified shim Trusted compute base: shim, Coq, Ocaml, OS
Performance Evaluation Compare with etcd, a similar open-source store 10% performance overhead Mostly disk/network bound etcd has had linearizability bugs
Previous Approaches EventML [Schiper 2014] Verified Paxos using the NuPRL proof assistant MACE [Killian 2007] Model checking distributed systems in C++ TLA+ [Lamport 2002] Specification language and logic
Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as Thanks! transformation between semantics http://verdi.uwplse.org
Recommend
More recommend