key value vst store james r wilcox doug woos pavel
play

Key-value VST store James R. Wilcox, Doug Woos, Pavel - PowerPoint PPT Presentation

Verdi: A Framework for Implementing and Formally Verifying Distributed Systems Key-value VST store James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson Challenges Distributed systems


  1. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems ✓ ✓ Key-value VST store James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson

  2. Challenges Distributed systems run in unreliable environments Many types of failure can occur Fault-tolerance mechanisms are challenging to implement correctly

  3. Challenges Contributions Distributed systems run in Formalize network as unreliable environments operational semantics Many types of failure can occur Build semantics for a variety of fault models Fault-tolerance mechanisms Verify fault-tolerance as are challenging to transformation between implement correctly semantics

  4. Verdi Workflow ✓ Build, verify system Key-value Client store in simple semantics I/O V ✓ S Apply verified system transformer T ✓ KV Consensus End-to-end correctness KV Client by composition I/O Consensus KV Consensus

  5. Contributions General Approach Find environments Formalize network as in your problem domain operational semantics Formalize these environments Build semantics for as operational semantics a variety of fault models Verify fault-tolerance as Verify layers as transformations between transformation between semantics semantics

  6. Verdi Successes Applications Key-value store Lock service Fault-tolerance mechanisms Sequence numbering Retransmission Primary-backup replication Consensus-based replication linearizability

  7. Replicated KV store Important data Replicated Replicated KV store KV store Replicated for availability

  8. Crash Replicated Reorder KV store Drop Duplicate Partition ... Replicated Replicated KV store KV store Environment is unreliable

  9. Crash Replicated Reorder KV store Drop Duplicate Partition ... Replicated Replicated KV store KV store Decades of research; still difficult to implement correctly Implementations often have bugs

  10. Bug-free Implementations Several inspiring successes in formal verification CompCert, seL4, Jitk, Bedrock, IronClad, Frenetic, Quark Goal: formally verify distributed system implementations ✓

  11. Formally Verify Distributed Implementations Separate independent system components

  12. Formally Verify Distributed Implementations App Fault tolerance App App Fault Fault tolerance tolerance Separate independent system components Verify application logic independently from fault-tolerance application logic fault tolerance

  13. Formally Verify Distributed Implementations KV 1. Verify application logic Consensus 2. Verify fault tolerance mechanism KV KV 3. Run the system! Consensus Consensus Separate independent system components Verify application logic independently from consensus key-value store consensus

  14. 1. Verify Application Logic ✓ Simple model, Key-value Client store prove “good map” I/O

  15. 2. Verify Fault Tolerance Mechanism ✓ Simple model, Key-value Client store prove “good map” I/O V ✓ S Apply verified system transformer, prove “properties preserved” T ✓ KV Consensus End-to-end correctness KV Client by composition I/O Consensus KV Consensus

  16. 3. Run the System! KV Consensus KV Consensus KV Consensus Extract to OCaml, link unverified shim Run on real networks

  17. Verifying application logic

  18. Simple One-node Model Key-value Set “k” “v" Resp “k” “v” State: State: {} {“k”: “v”} Trace: [Set “k” “v", Resp “k” “v”]

  19. Simple One-node Model System Output: o Input: 풊 State: σ State: σ ’ Trace: [ 풊 , o ] H inp ( σ , i ) = ( σ 0 , o ) Input ( σ , T ) s ( σ 0 , T ++ h i, o i )

  20. Simple One-node Model Spec: operations have expected behavior (good map) Set, Get Del, Get Verify system against semantics by induction Safety Property

  21. Verifying Fault Tolerance

  22. The Raft Transformer Log of operations Consensus provides a 
 Original system replicated state machine Raft Same inputs on each node Calls into original system Raft Raft

  23. The Raft Transformer When input received: Add to log Raft Send to other nodes When op replicated: Raft Raft Apply to state machine Send output

  24. The Raft Transformer For KV store: Ops are Get, Set, Del Raft State is dictionary Raft Raft

  25. Raft Correctness V ✓ S T Correctly transforms systems Preserves traces Raft Linearizability Raft Raft

  26. Fault Model Model global state Model internal communication Model failure

  27. Fault Model: Global State Machines have names 1 Σ maps name to state Σ [1] 2 3 Σ [2] Σ [3]

  28. Fault Model: Messages 1 Network Σ [1] Vote? Vote? <1,2,”Vote?”> <1,3,”Vote?”> 2 3 Σ [3] Σ [2] <2,1,”+1”> Σ ’[2] = σ ’ Output: o Σ 0 = Σ [ dst 7! σ 0 ] H net ( dst, Σ [ dst ] , src, m )=( σ 0 , o, P 0 ) ( { ( src, dst, m ) } ] P, Σ , T ) r ( P ] P 0 , Σ 0 , T ++ h o i )

  29. Fault Model: Failures Network <1,2,”Vote?”> <1,3,”Vote?”> <1,3,”Vote?”> Message drop 1 Message duplication Σ [1] Machine crash 2 3 Σ [2] Σ [3]

  30. Fault Model: Drop Network <1,2,”hi”> <1,3,”hi”> Drop ( { p } ] P, Σ , T ) drop ( P, Σ , T )

  31. Toward Verifying Raft General theory of linearizability 1k lines of implementation, 5k lines for linearizability State machine safety: 30k lines Most state invariants proved, some left to do

  32. Verified System Transformers Functions on systems Transform systems between semantics Maintain equivalent traces Get correctness of transformed system for free

  33. Verified System Transformers Raft Primary Seq # and Ghost Consensus Backup Retrans Variables App App

  34. Running Verdi Programs

  35. Running Verdi Programs Coq extraction to Ocaml Thin, unverified shim Trusted compute base: shim, Coq, Ocaml, OS

  36. Performance Evaluation Compare with etcd, a similar open-source store 10% performance overhead Mostly disk/network bound etcd has had linearizability bugs

  37. Previous Approaches EventML [Schiper 2014] Verified Paxos using the NuPRL proof assistant MACE [Killian 2007] Model checking distributed systems in C++ TLA+ [Lamport 2002] Specification language and logic

  38. Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as Thanks! transformation between semantics http://verdi.uwplse.org

Recommend


More recommend