Verifying Distributed Programs via Canonical Sequentialization Klaus von Gleissenthall Joint work with Alexander Bakst, Ranjit Jhala and Rami Gökhan Kıcı 1
Writing distributed programs A bug appears… Issue: random hangs / deadlock in mono
Writing distributed programs … haunts you … occurs in about Issue: 10% of our runs random hangs / deadlock in mono
Writing distributed programs … then you write some more code… moved to version 4.8.0.483.
Writing distributed programs … and the bug disappears… yet to reproduce moved to the issue in version 4.8.0.483. 4.8.0.483
Writing distributed programs …leaving you hoping it stays gone. yet to reproduce moved to the issue in version 4.8.0.483. 4.8.0.483 should be more confident in a few weeks
A better world Can we catch all deadlocks during compile-edit cycle?
A better world let’s fix it coord :: Transaction -> Int -> SymSet ProcessId -> Process () coord transaction n nodes = do fold query () nodes n_ <- fold countVotes 0 nodes if n == n_ then sent wrong forEach nodes commit () else response address forEach nodes abort () forEach nodes expect :: Ack unmatched where query () pid = do { me <- myPid; send pid (pid, transaction) } receive countVotes init nodes = do msg <- expect :: Vote case msg of Accept _ -> return (x + 1) Reject -> return x acceptor :: Process () acceptor = do me <- myPid (who, transaction) <- expect :: (ProcessId, Transaction) unmatched vote <- chooseVote transaction send send who vote check
A better world A better world proof No deadlocks can occur! coord :: Transaction -> Int -> SymSet ProcessId -> Process () coord transaction n nodes = do fold query () nodes n_ <- fold countVotes 0 nodes if n == n_ then forEach nodes commit () else forEach nodes abort () forEach nodes expect :: Ack where query () pid = do { me <- myPid; send pid (me, transaction) } countVotes init nodes = do msg <- expect :: Vote case msg of Accept _ -> return (x + 1) Reject -> return x acceptor :: Process () acceptor = do me <- myPid (who, transaction) <- expect :: (ProcessId, Transaction) vote <- chooseVote transaction send who vote check
This talk: Brisk Proves absence of deadlocks Provides counterexamples Fast enough for interactive use Restricted computation model
Restricted computation model But Expressive Enough to Implement: - Work Stealing - Map Reduce - Distributed File System
Outline The Problems The Key Idea The Implementation The Evaluation
The Problems
Example: Two phase commit (2PC) Goal: Commit Transaction to all nodes nodes coordinator
Example: Two phase commit (2PC) Phase 1 depending on the value, votes to commit or abort data sends data
Example: Two phase commit (2PC) Phase 1 depending on the value, votes to commit or abort commit commits if no one voted to abort commit commit aborts otherwise commit
Example: Two phase commit (2PC) Phase 2 commits transaction commit sends decision to commit (or abort)
Example: Two phase commit (2PC) Phase 2 send acknowledgement ACK done
How to verify 2PC? Sends match receives? Does Implementation Deadlock?
How to verify 2PC?
How to verify 2PC? Problem: Asynchrony messages may travel at different speeds data commit processes execute at different speeds commit Races trigger different behaviors commit
How to verify 2PC? Problem: Unbounded Processes … … don’t know how many nodes at runtime
How to verify 2PC? Testing? No guarantees Proofs? High user burden Model checking…? Infinite number of states
Outline The Problems The Key Idea The Implementation The Evaluation
Outline The Problems The Key Idea The Implementation The Evaluation
The Key Idea Canonical Sequentialization
Canonical Sequentialization Don’t enumerate execution orders… 1 ; 1 ; 2 2 3 3 … Reason about single representative execution
Canonical Sequentialization Example 2PC 1. Sends 4. Send 3. Relay decision transaction it wants to 2. Send votes acknowledgments commit ; ; ; ; 1 1 1 1 ; ; ; ; ; ; ; ; ; 2 2 2 2 ; ; ; 3 3 3 3
Canonical Sequentialization A Trickier Example Work stealing queue
Work stealing queue workers perform tasks queue 1 coordinator assigns work collects results 2 3
Work stealing queue idle workers ask for work queue assigns an 1 item 2 3 sends result to the coordinator compute results
Sequentialized queue arbitrary who assigns task for sends it to worker picks computes to arbitrary each master result from writes result worker item set to result set 1 ; ; 1 ; ; ; 3 ; 1 ; 1 ; 3
How can sequentialization help verify programs?
How can sequentialization help verify programs? no sequentialization means likely compute its wrong canonical sequentialization use to implies same on simpler, prove deadlock halting sequential additional freedom states program properties
Outline The Problems The Key Idea The Implementation The Evaluation
Outline The Problems The Key Idea The Implementation The Evaluation
The Implementation
The Implementation 1. Restrict Computation Model 2. Sequentialize by Rewriting
1. Restrict Computation Model Symmetric Nondeterminism Races yield equivalent outcomes
Symmetric Nondeterminism Example: Phase 1 of 2PC data coordinator sends transaction no race
Symmetric Nondeterminism Example: Phase 1 of 2PC Send vote commit Race same outcome? commit processes are symmetric commit
Symmetry Symmetry means invariant under invariance under not this one rotation transformation look at from above
Symmetry In Distributed Systems [Norris and Dill 1996] Permuting Process Identifiers Yields equivalent halting states
Symmetry Example: Phase 1 of 2PC Name the processes n1 Permuting n1 and n2 n2 equivalent halting states n3
Symmetric Nondeterminism Example: Phase 1 of 2PC choose between picking n1 commit n1 and n2 (msg,id) <-recv; (commit,n1) pick n1 n2 commit did we lose any states? n3 commit
Symmetric Nondeterminism Example: Phase 1 of 2PC No! n1 commit if we pick n2 (msg,id)<-recv; (commit,n1) (commit,n2) we can n2 commit permute ids to end up in same state so the n3 commit states have the same behavior
How can we use symmetry to sequentialize?
Symmetric Nondeterminism Example: Phase 1 of 2PC receive directly after sending data [Lipton75] coordinator sends transaction no race
Symmetric Nondeterminism Example: Phase 1 of 2PC ; ; ;
Symmetric Nondeterminism Example: Phase 1 of 2PC Send vote commit What Race now? processes commit are symmetric equivalent pick any! outcomes commit
Symmetric Nondeterminism Example: Phase 1 of 2PC ; ; ;
The Implementation 1. Restrict Computation Model 2. Sequentialize by Rewriting
2. Sequentialize by Rewriting (by example)
2. Sequentialize by Rewriting Example 1 send q ping v <- recv p; v <- ping ; q || w <- recv q send p pong p q p, q are in parallel
2. Sequentialize by Rewriting Example 1 Sequentialization v <- ping ; q || w <- recv q send p pong w <- pong p q p p, q are in parallel
2. Sequentialize by Rewriting Example 2 loop over set processes of symmetric processes for q in qs do v <- recv p; ∏ send q ping || send p pong w <- recv q q ∈ qs q end p p, qs={q1…qn} are in parallel
2. Sequentialize by Rewriting Example 2 Arbitrary Generalize iteration for q in qs do for q in qs do v <- recv p; ∏ v <- ping ; send q ping || q send p pong w <- pong w <- recv q q ∈ qs p q end end p p, qs={q1…qn} are in parallel
2. Sequentialize by Rewriting Example 3 two loops { for q in qs do send q ping end ∏ v <- recv p; || { for q in qs do q ∈ qs send p pong w <- recv qs q end p
2. Sequentialize by Rewriting Example 3 for q in qs do send q ping end for q in qs do ∏ v <- recv p; || for q in qs do v <- ping ; q ∈ qs send p pong q end w <- recv qs q end p
2. Sequentialize by Rewriting Example 3 partially for q in qs do sequentialized for q in qs do v <- ping ; q ; v <- ping ; q end end symmetric (checked) for q in qs do for q in qs do w <- pong ∏ p || send p pong w <- recv qs end q q ∈ qs end p
The Implementation 1. Restrict Computation Model 2. Sequentialize by Rewriting
Outline The Problems The Key Idea The Implementation The Evaluation
Outline The Problems The Key Idea The Implementation The Evaluation
The Evaluation
The Evaluation computes canonical sequentialization Implemented in a Haskell library ; ; Brisk ; communication primitives like send / receive / foreach provides counterexample to sequentialization
Recommend
More recommend