debugging and improving the c c 11 memory model
play

Debugging and improving the C/C++11 memory model Viktor Vafeiadis - PowerPoint PPT Presentation

Debugging and improving the C/C++11 memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) January 2016 The C11 memory model Defines the semantics of concurrent memory accesses in C/C++. Standardised by ISO C/C++


  1. Debugging and improving the C/C++11 memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) January 2016

  2. The C11 memory model Defines the semantics of concurrent memory accesses in C/C++. Standardised by ISO C/C++ 2011. Used: ◮ By several POPL/PLDI/OOPSLA papers ◮ Internally by LLVM IR ◮ Indirectly by every program 2

  3. The C11 memory model: Atomics Two types of locations Ordinary Atomic (Non-Atomic) Welcome to the Races are errors expert mode 3

  4. The C11 memory model: a spectrum of accesses Seq. consistent full memory fence Release write Acquire read no fence (x86); lwsync (PPC) no fence (x86); isync (PPC) Relaxed no fence Non-atomic no fence, races are errors Explicit primitives for fences 4

  5. An execution in C11: actions and relations (and axioms) na ( a , 0) W na ( x , 0) W rf po po na ( a , 5) R acq ( x , 0) W sw po po rel ( x , 1) W R acq ( x , 1) rf rf po hb � ( po ∪ sw ) + R na ( a , 5) Initially a = x = 0. a = 5; while ( x . load( acq ) == 0); x . store(1 , release ); print( a ); 5

  6. Relaxed behaviour: store buffering Initially x = y = 0. x . store(1 , rlx ); y . store(1 , rlx ); t 1 = y . load( rlx ); t 2 = x . load( rlx ); This can return t 1 = t 2 = 0. Justification [ x = y = 0] Behaviour observed on rlx ( x , 1) rlx ( y , 1) W W x86/Power/ARM R rlx ( y , 0) R rlx ( x , 0) 6

  7. Coherence Programs with a single shared variable behave as under SC. x . store(1 , rlx ); a = x . load( rlx ); x . store(2 , rlx ); b = x . load( rlx ); The outcome a = 2 ∧ b = 1 is forbidden. W rlx ( x , 1) R rlx ( x , 2) rlx ( x , 2) R rlx ( x , 1) W 7

  8. Coherence Programs with a single shared variable behave as under SC. x . store(1 , rlx ); a = x . load( rlx ); x . store(2 , rlx ); b = x . load( rlx ); The outcome a = 2 ∧ b = 1 is forbidden. W rlx ( x , 1) R rlx ( x , 2) mo x rlx ( x , 2) R rlx ( x , 1) W rb x ◮ Modification order, mo x , total order of writes to x . ◮ Reads-before : rb x � ( rf − 1 ; mo x ) ∩ ( � =) ◮ Coherence : hb ∪ rf x ∪ mo x ∪ rb x is acyclic for all x . 7

  9. Causality cycles with relaxed accesses Initially x = y = 0. if ( x . load ( rlx ) == 1) if ( y . load ( rlx ) == 1) y . store (1 , rlx ); x . store (1 , rlx ); C11 allows the outcome x = y = 1. Justification R rlx ( x , 1) R rlx ( y , 1) Relaxed accesses don’t synchronize W rlx ( y , 1) W rlx ( x , 1) 8

  10. No causality cycles with non-atomics Initially x = y = 0. if ( x == 1) if ( y == 1) y = 1; x = 1; C11 forbids the outcome x = y = 1. Justification Non-atomic read axiom: rf ∩ (_ × NA ) ⊆ hb 9

  11. Is the C11 memory model definition. . . 1. Mathematically sane? ◮ For example, it is monotone. 2. Not too weak? ◮ Provides useful reasoning principles. 3. Not too strong? ◮ Can be implemented efficiently. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  12. Is the C11 memory model definition. . . 1. Mathematically sane? ◮ For example, it is monotone. 2. Not too weak? ◮ Provides useful reasoning principles. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  13. Is the C11 memory model definition. . . 1. Mathematically sane? ◮ For example, it is monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  14. Is the C11 memory model definition. . . 1. Mathematically sane? ✗ No, it is not monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ◮ Admits the intended program optimisations. 10

  15. Is the C11 memory model definition. . . 1. Mathematically sane? ✗ No, it is not monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ✓ Compilation to x86/Power/ARM. 4. Actually useful? ✗ No, it disallows intended program transformations. 10

  16. Is the C11 memory model definition. . . 1. Mathematically sane? ✗ No, it is not monotone. 2. Not too weak? ≈ Reasoning principles for C11 subsets. 3. Not too strong? ≈ Compilation to x86/Power/ARM. 4. Actually useful? ✗ No, it disallows intended program transformations. 10

  17. Non-atomic reads of atomic variables are unsound! Initially, x = 0. if ( x . load( rlx ) == 1) x . store(1 , rlx ); t = (int) x ; The program can get stuck! W na ( x , 0) rlx ( x , 1) R rlx ( x , 1) W R na ( x , ? ) ◮ Reading 0 contradicts coherence. ◮ Reading 1 contradicts the non-atomic read axiom. 11

  18. Sequentialisation is invalid Initially, a = x = y = 0. if ( x . load( rlx ) == 1) if ( y . load( rlx ) == 1) a = 1; if ( a == 1) x . store(1 , rlx ); y . store(1 , rlx ); The only possible output is: a = 1 , x = y = 0 . Recall the non-atomic read axiom: rf ∩ (_ × NA ) ⊆ hb 12

  19. Tentative fixes Remove non-atomic read axiom. ◮ gives extremely weak guarantees, if any In addition, forbid ( hb ∪ rf )-cycles. ◮ rules out causal loops ◮ forbids some reorderings ◮ more costly on ARM/Power Or alternatively forbid ( hb ∪ rf )-cycles with NA accesses. ◮ allows more racy behaviours ◮ forbids some reorderings 13

  20. Tentative fixes Open problem Remove non-atomic read axiom. ◮ gives extremely weak guarantees, if any In addition, forbid ( hb ∪ rf )-cycles. ◮ rules out causal loops ◮ forbids some reorderings ◮ more costly on ARM/Power Or alternatively forbid ( hb ∪ rf )-cycles with NA accesses. ◮ allows more racy behaviours ◮ forbids some reorderings 13

  21. Monotonicity “Adding synchronisation should not introduce new behaviours” Examples: ◮ Reducing parallelism, C 1 � C 2 � C 1 ; C 2 ◮ Expression evaluation linearisation: � x = a + b ; t 1 = a ; t 2 = b ; x = t 1 + t 2 ; ◮ Adding a memory fence ◮ Strengthening the access mode of an operation ◮ (Roach motel reorderings) 14

  22. Other problems fixed (POPL’15, POPL’16) The axiom of SC reads is too weak. ◮ Makes strengthening unsound. The axioms of SC fences are too weak. ◮ They do not guarantee sequential consistency. The definition of release sequences is too strong. ◮ Removing ( po ∪ rf )-final events is unsound. 15

  23. Transformation correctness

  24. Valid instruction reorderings a ; b � b ; a (POPL’15) ↓ a \ b → R � = sc R sc W na W rlx W ⊒ rel C rlx | acq C ⊒ rel F acq F rel R na ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✗ R rlx ✓ ✓ ✓ ( ✓ ) ✗ ( ✓ ) ✗ ✗ ✗ R ⊒ acq ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ W � = sc ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✗ W sc ✓ ✗ ✓ ✓ ✗ ✓ ✗ ✓ ✗ C rlx | rel ✓ ✓ ✓ ( ✓ ) ✗ ( ✓ ) ✗ ✗ ✗ C ⊒ acq ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ F acq = ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ F rel ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓ = 17

  25. Redundant instruction eliminations (POPL’15) Overwritten write: x . store( v , M ) ; C ; x . store( v ′ , M ) C has no rel � C ; x . store( v ′ , M ) & no x accesses Read after write: x . store( v , M ) ; C ; t = x . load( M ′ ) C has no acq � x . store( v , M ) ; C ; t = v & no x accesses Read after read: t = x . load( M ) ; C ; t ′ = x . load( M ) C has no acq � t = x . load( M ) ; C ; t ′ = t & no x accesses 18

  26. Is DRF semantics really what we want?

  27. Should these transformations be allowed? 1. CSE over a lock acquire: t 1 = X ; t 1 = X ; � lock (); lock (); t 2 = X ; t 2 = t 1 ; If X changes in between, the program is racy. 2. Load hoisting: if( c ) t = X ; � r = X ; r = c ? t : r ; This may introduce a race, but the racy value is not used. 20

  28. Allowing both is clearly wrong! Consider the transformation sequence: if ( c ) t = X ; t = X ; r 1 = X ; r 1 = c ? t : r 1 ; r 1 = c ? t : r 1 ; � � lock (); lock (); lock (); r 2 = X ; r 2 = X ; r 2 = t ; When c is false, X is moved out of the critical region! So we have to forbid one transfomation. ◮ C11 forbids load hoisting, allows CSE over lock(). ◮ LLVM allows load hoisting, forbids CSE over lock(). 21

  29. Taming the release-acquire fragment

  30. Recall the spectrum of C11 access types Seq. consistent full memory fence Release write Acquire read no fence (x86); lwsync (PPC) no fence (x86); isync (PPC) Relaxed no fence Non-atomic no fence, races are errors 23

  31. C11’s release-acquire memory model C11 model where all reads are acquire, all writes are release, and all atomic updates are acquire/release Store buffering [ x = y = 0] x = y = 0 mo y mo x x := 1; y := 1; W x , 1 W y , 1 print y print x rf rf both threads may print 0 R y , 0 R x , 0 Message passing [ x = m = 0] x = m = 0 mo m mo x rf while x = 0 W m , 42 R x , 1 m := 42; skip ; rf x := 1 print m W x , 1 R m , 0 hb only 42 may be printed 24

  32. Good news ◮ Verified compilation schemes: ◮ x86-TSO (trivial compilation) [Batty el al. ’11] ◮ Power [Batty el al. ’12] [Sarkar el al. ’12] ◮ RA supports intended optimizations: ◮ In particular, write-read reordering (unlike SC): � W x → R y R y → W x ◮ DRF theorem: ◮ No data races under SC ensures no weak behaviors ◮ Monotonicity: ◮ Adding synchronization does not introduce new behaviors ◮ Program logics: ◮ RSL [Vafeiadis and Narayan ’13] ◮ GPS [Turon et al. ’14] ◮ OGRA [Lahav and Vafeiadis ’15] 25

Recommend


More recommend