Dynamic Synthesis for Relaxed Memory Models Feng Liu*, Nayden Nedev*, Nedyalko Prisadnikov †, Martin Vechev§, Eran Yahav ‡ *Princeton University, † Sofia University, § ETH Zurich, ‡ Technion 06/13/2012 PLDI 2012, Beijing
Relaxed memory models • No sequential consistency (SC) in chips today • Chip designers implement “relaxed memory” on different architectures : - total store order ( TSO ) Intel’s and AMD’s X86 ; SPARC - partial store order ( PSO ) SPARC - PPC model IBM’s PowerPC; ARM - …
Modeling TSO & PSO Programs • Store Buffering – FIFO queues (buffers) associated with threads – A store goes to a local buffer, not memory – Stores in buffers are flushed at non-deterministic times • Store Forwarding – Satisfy loads from local buffer if possible 3
PSO Example H=0, Done=0 Fails on PSO thread1 thread2 H=1; while (!Done) { } Done=1; assert(H= =1); store flush … H 1 … t1 Done 1 Main Memory … H … t2 Done Done=1 H=0 load 4
Memory Fences H=0, Done=0 thread1 thread2 H=1; while (!Done) { } Fence; assert(H= =1); Done=1; Memory fence is very expensive (10-100 cycles) Use only where necessary 5
Our Approach C/C++ Program P Program P’ FENDER Specification with Dynamic Analysis & S Fences static fixing Memory Model M P’ satisfies S under M 6
Challenge: Handling real-world concurrent programs A lock-free memory allocator 771 lines of C code 2699 lines of IR code [1] M. Michael, “scalable lock - free dynamic memory allocation,” PLDI’04.
Real-World Programs? • Exposing violations under relaxed memory models – Violations occur rarely • Many possible fence placements – Large programs • Written in C/C++ language – Rather than program models 8
Contributions • Demonic scheduler to expose violations – Delay flushes of values from store buffer to main memory • Avoiding bad executions by adding fences – Extracting ordering constraints from bad executions – Enforcing ordering constraints using fences • Parametric synthesis framework – Different memory models • Evaluating fences required under different memory models and correctness criteria – Found redundant and missing fences – Linearizability on relaxed memory models – Handled real C/C++ programs 9
Fender Framework – Support for concurrency and RMM Concurrent Client C/C++ code LLVM-GCC .bc LLVM Interpreter Threading our extension Demonic Memory Scheduler Model existing work 10
Our work – Dynamic analysis Concurrent Client C/C++ code LLVM-GCC .bc SAT assignment LLVM Interpreter Trace Analysis SAT Solver Threading trace Order formula our extension Demonic Memory Specification Scheduler Model existing work 11
Our work – Implement memory fences Fixed bytecode & Concurrent Client Fence location report C/C++ code LLVM-GCC Fence Enforcement .bc SAT modified .bc assignment LLVM Interpreter Trace Analysis SAT Solver Threading Order trace formula our extension Demonic Memory Specification Scheduler Model existing work 12
Example H=0, Done=0 thread1 thread2 L1: H=1; L3: while (!Done) { } L2: Done=1; L4: assert(H==1); : : … H … t1 Done Main Memory … H … t2 Done 13
Interpretation on PSO H=0, Done=0 thread1 thread2 L3: Load Done c L1: H=1; L3: while (!Done) { } L1: Store H=1 L2: Done=1; L4: assert(H==1); : L2: Store Done=1 : L4: Load H trace L1 … H 1 … t1 Done 1 L2 Main Memory … H … t2 Done c load store flush 14
Interpretation on PSO H=0, Done=0 thread1 thread2 c L3 L1: H=1; L3: while (!Done) { } L1 L2: Done=1; L4: assert(H==1); : : L2 trace L3 … H … t1 Done Main Memory … H … t2 Done c load store flush 15
Flush with a probability H=0, Done=0 thread1 thread2 c L3 L1: H=1; L3: while (!Done) { } L1 L2: Done=1; L4: assert(H==1); : L2 : L3 trace L1 … H 1 … t1 Done Main Memory … H … t2 Done c load store flush 16
Execution trace H=0, Done=0 thread1 thread2 c L3 L1: H=1; L3: while (!Done) { } L1 L2: Done=1; L4: assert(H==1); : L2 : L3 trace L4 L1 … H 1 … t1 Done Main : Memory … H … t2 Done c load store flush 17
Checking Specification . . . . . . . . trace different executions c load store flush 18
Repair one trace L1 … H 1 L1 … t1 Done 1 L2 L2 Main Memory … . . . . H trace … t2 Done C order predicate [L1, L2] D : order formula [L1, L2] [C, D] … for a single execution x 1 x 2 19
Repair all incorrect traces . . . . . . . . trace One memory fence should be placed here trace1 trace2 trace3 different executions Global formula to SAT solver: (x 1 x 2 ..) (x 1 x 3 ..) … trace1 trace3 20
Fix the program H=0, Done=0 thread1 thread2 L1: H=1; L3: while(! Done) { } Fence; L4: assert(H==1); . . . . L2: Done=1; : : 21
Evaluation - Benchmarks Memory safety Operational Sequential consistency linearizability Program TSO PSO TSO PSO TSO PSO 0 0 1 2 2 3 Chase-Lev WSQ Cilk’s THE WSQ 0 0 1 3 - - Work stealing queues 0 0 0 2 1 2 FIFO WSQ 0 0 0 1 0 1 LIFO WSQ 0 0 0 1 0 1 Anchor WSQ 0 3 - - - - FIFO iWSQ Idempotent Work stealing queues 0 2 - - - - LIFO iWSQ 0 2 - - - - Anchor iWSQ 0 0 0 0 0 0 MS2 Queue Concurrent data structures 0 0 0 1 0 1 MSN Queue 0 0 0 0 0 0 LazyList Set Harris’s Set 0 0 0 1 0 1 Memory 0 3 0 4 0 4 Lock-free memory allocator allocator 22
Evaluation - Specifications Memory safety Operational Sequential consistency linearizability Program TSO PSO TSO PSO TSO PSO 0 0 1 2 2 3 Chase-Lev WSQ Cilk’s THE WSQ 0 0 1 3 - - 0 0 0 2 1 2 FIFO WSQ 0 0 0 1 0 1 LIFO WSQ 0 0 0 1 0 1 Anchor WSQ 0 3 - - - - FIFO iWSQ 0 2 - - - - LIFO iWSQ 0 2 - - - - Anchor iWSQ 0 0 0 0 0 0 MS2 Queue 0 0 0 1 0 1 MSN Queue 0 0 0 0 0 0 LazyList Set Harris’s Set 0 0 0 1 0 1 Memory 0 3 0 4 0 4 allocator 23
Evaluation - Memory models Memory safety Operational Sequential consistency linearizability Program TSO PSO TSO PSO TSO PSO 0 0 1 2 2 3 Chase-Lev WSQ Cilk’s THE WSQ 0 0 1 3 - - 0 0 0 2 1 2 FIFO WSQ 0 0 0 1 0 1 LIFO WSQ 0 0 0 1 0 1 Anchor WSQ 0 3 - - - - FIFO iWSQ 0 2 - - - - LIFO iWSQ 0 2 - - - - Anchor iWSQ 0 0 0 0 0 0 MS2 Queue 0 0 0 1 0 1 MSN Queue 0 0 0 0 0 0 LazyList Set Harris’s Set 0 0 0 1 0 1 Memory 0 3 0 4 0 4 allocator 24
Evaluation - number of memory fences Memory safety Operational Sequential consistency linearizability Program TSO PSO TSO PSO TSO PSO 0 0 1 2 2 3 Chase-Lev WSQ Cilk’s THE WSQ 0 0 1 3 - - 0 0 0 2 1 2 FIFO WSQ 0 0 0 1 0 1 LIFO WSQ 0 0 0 1 0 1 Anchor WSQ 0 3 - - - - FIFO iWSQ 0 2 - - - - LIFO iWSQ 0 2 - - - - Anchor iWSQ 0 0 0 0 0 0 MS2 Queue 0 0 0 1 0 1 MSN Queue 0 0 0 0 0 0 LazyList Set Harris’s Set 0 0 0 1 0 1 Memory 0 3 0 4 0 4 allocator 25
Conclusion • Demonic scheduler to expose violations • Avoiding bad executions by adding fences • Parametric synthesis framework • Evaluating fences required under different memory models and correctness criteria 26
Thanks! Q & A 27
Recommend
More recommend