Thread-Modular Reasoning for Lock-Free Data Structures Roland Meyer based on joint work with Luká š Holík, Tomá š Vojnar, and Sebastian Wol ff .
Lock-Free Data Structures Key Take Aways: • e ffi cient but complex • correctness = linearizability • checking linearizability reduces to reachability http://www.braunschweig-fotograf.de/mein-braunschweig/
Concept • avoid locks ➡ critical section cannot exist • single commands are atomic ➡ compare-and-swap (CAS) CAS(src, cmp, dst) := atomic { if (src != cmp) return false; src = dst; return true; }
Example: Treiber’s Stack push(val): pop(): 1 node = new Node(val); while (true) { while (true) { top = ToS; top = ToS; if (top == NULL) node.next = top; return EMPTY; if (CAS(ToS, top, node)) next = top.next; return; if (CAS(ToS, top, next)) } return top.data; } ToS top node
Example: Treiber’s Stack push(val): pop(): node = new Node(val); while (true) { while (true) { top = ToS; top = ToS; if (top == NULL) node.next = top; return EMPTY; if (CAS(ToS, top, node)) next = top.next; return; if (CAS(ToS, top, next)) 1 } return top.data; } ToS top node top next
Example: Treiber’s Stack push(val): pop(): 2 node = new Node(val); while (true) { while (true) { top = ToS; top = ToS; if (top == NULL) node.next = top; return EMPTY; if (CAS(ToS, top, node)) next = top.next; 1 return; if (CAS(ToS, top, next)) } return top.data; } ToS top top next next top 2 next 2
Correctness and Concurrency • pre/post conditions meaningless ➡ other correctness criteria required • linearizability ➡ every concurrent run must coincide with a sequential run ➡ most common for lock-free data structures ➡ illusion of sequentiality [ Filipovi ć et al. ESOP’09 ]: linearizable ⟺ sequential and concurrent implementation are observationally equivalent
Checking Linearizability • check sequentiality illusion ➡ sufficient: sequence of linearization points is valid [ Abdulla et al. TACAS’13 ] (intuitively: linearization point = change of data structure takes effect) concurrent ( DS ) | = sequential ( DS ) ⇒ linp ( DS ) ⊆ sequential ( DS ) ⇐ ⇒ linp ( DS ) ∩ sequential ( DS ) = ∅ ⇐ ⇒ linp ( DS ) ∩ observer ( DS ) = ∅ ⇐ ➡ checking linearizability is a reachability problem
Overview 1. thread-modular reasoning 2. ownership 3. summaries
Thread-Modular Reasoning [Qadeer, Flanagan SPIN’03] Key Take Aways: • compute reachability • interference is key to scalability
Concept • view abstraction ➡ split states into set of views ➡ views capture perception of 1 thread (abstract from correlation) • state exploration ➡ fixed-point computation: X = X ∪ sequential ( X ) ∪ interference ( X )
Example: View Abstraction X = X ∪ sequential ( X ) ∪ interference ( X ) ToS ToS CAS(ToS, top, next) 1 Note: both views are equal . CAS(ToS, top, next) 2 top 1 next 1 top 2 next 2
Example: Sequential Step X = X ∪ sequential ( X ) ∪ interference ( X ) ToS CAS(ToS, top, next) 1 top 1 next 1 No concurrent behavior.
Example: Interference Step X = X ∪ sequential ( X ) ∪ interference ( X ) ToS CAS(ToS, top, next) 1 top 1 next 1 ToS CAS(ToS, top, next) 2 top 2 next 2 1. combine
Example: Interference Step X = X ∪ sequential ( X ) ∪ interference ( X ) ToS CAS(ToS, top, next) 1 CAS(ToS, top, next) 2 top 1 next 1 top 2 next 2 1. combine 2. step 3. project
Challenges with Interference • number of possible combinations is enormous ➡ not all combinations are reasonable • need pruning to make the approach practical ➡ precision ➡ performance • pruning must be sound
Pruning Interferences two types • matching ➡ Is it possible to combine at all? Skip if not. • correlation ➡ Which nodes should coincide?
Matching: Complication • matching gets harder due to finite abstraction • we use reachability predicates (shape analysis): • 0-step: = ToS • 1-step: � node • n-step: ⤏ ToS // ToS ⤏ NULL • unreach: ⋈ // node � ToS node
Matching: Example ToS Subgraph top next isomorphism: NP-complete! logical stack content ToS node
Correlation: Example ToS node 1 … ToS … next 2 node ?? top 2 Exponentially many! ToS … ToS top next … top 2 next 2 node 1
Practicality is about Interference • interference ➡ quadratic in size of state space • matching poor scalability ➡ subgraph isomorphism (NP) fight imprecision (false-positives) • correlation ➡ exponential
Ownership Key Take Aways: • ownership saves the day • even under explicit memory management
Concept partition allocated heap into • owned ➡ exclusive access for a single thread ➡ granted upon allocation • shared ➡ accessible by every thread ➡ by publishing (e.g. making accessible via shared variables)
Ownership in Thread-Modular Reasoning [Gotsman et al. PLDI’07] • track ownership ➡ small overhead • matching ➡ owned cells not contained • correlation ➡ owned cells not merged with other nodes
Ownership and Correlation ToS ToS node 1 … own … next 2 node node ?? top 2 ToS … ToS … top next top 2 next 2 node 1
Ownership in Thread-Modular Reasoning • helps a lot with Only for garbage ➡ matching collection (GC)! ➡ correlation • makes thread-modular reasoning practical What about explicit ➡ prunes false-positives memory management (MM)?
Problem with MM Ownership does not exist under explicit memory management. — folklore • almost true • indeed no exclusivity ➡ dangling pointers • we introduced weak ownership in VMCAI’16
Weak Ownership [ VMCAI’16 ] • write exclusivity ➡ only owners may write owned dangling … • no read exclusivity … ➡ dangling readers allowed ➡ dangling reads unsafe ➡ only owner may rely on memory contents
Weak Ownership in Thread-Modular Reasoning [ VMCAI’16 ] • track dangling pointers ➡ small overhead • matching: like normal ownership • correlation ➡ -owned cells referenced by only via dangling pointers 1 2 • dangling write accesses may be unsafe ➡ report as bug
Performance Impact [ VMCAI’16 ] MM with MM without ownership ownership 944s 25.5s :37 Treiber’s stack :36 #116776 #3175 false positive 11700s Michael&Scott’s queue impractical > #69000 #19742
Accomplishments • ownership helps with matching and correlation • low overhead tracking additional info • deeming unsafe accesses as bugs reflects programming practice • performance improvements for analysis • but: not practical yet ➡ interference still computationally complex
Summaries Key Take Aways: • copy-and-check blocks • statelessness • e ffi cient interference
Observation • lock-freedom relies on copy-and-check blocks push(val): node = new Node(val); 1. create local copy of shared data while (true) { top = ToS; 1 2. make changes locally node.next = top; 2 if (CAS(ToS, top, node)) 3 3. publish changes if copy up-to-date return; } or retry otherwise ➡ updates appear atomically
Insight Threads cannot observe the local behavior of other threads. — SAS’17 So why do interference for all intermediate steps? ➡ instead: apply updates in one shot ➡ potentially unsound: stay tuned
Example: Summary for pop 1. make atomic atomic { while (true) { 2. remove noise top = ToS; if (top == NULL) return ; EMPTY next = top.next; if (CAS(ToS, top, next)) top.data ; return } }
Example: Summary for pop 1. make atomic atomic { while (true) { 2. remove noise top = ToS; if (top == NULL) 3. copy propagation return; next = top.next; if (CAS(ToS, top, next)) return; } }
Example: Summary for pop 1. make atomic atomic { while (true) { 2. remove noise top = ; ToS ToS ToS if ( == NULL) top 3. copy propagation return; top next = .next; top if (CAS(ToS, , next)) return; } }
Example: Summary for pop 1. make atomic atomic { while (true) { 2. remove noise if (ToS == NULL) 3. copy propagation return; 4. remove noise if (CAS(ToS, ToS, ToS.next)) 5. rewrite CAS return; } }
Example: Summary for pop 1. make atomic atomic { 2. remove noise if (ToS == NULL) 3. copy propagation return; 4. remove noise if (CAS(ToS, ToS, ToS.next)) ToS = ToS.next; return; 5. rewrite CAS return; }
Example: Summary for pop 1. make atomic atomic { 2. remove noise assume(ToS != NULL); if (ToS == NULL) 3. copy propagation return; 4. remove noise ToS = ToS.next; return; 5. rewrite CAS 6. rewrite guard }
Example: Summary for pop 1. make atomic atomic { 2. remove noise assume(ToS != NULL); 3. copy propagation 4. remove noise ToS = ToS.next; 5. rewrite CAS 6. rewrite guard }
Example: Summary for pop 1. make atomic atomic { assume(ToS != NULL); 2. remove noise ToS = ToS.next; } 3. copy propagation • easy to compute 4. remove noise ➡ similar for push 5. rewrite CAS • compact form beneficial for analysis (and understandability) 6. rewrite guard
Recommend
More recommend