Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 – p. 1
C++0x: the next C++ Specified by the C++ Standards Committee Defined in The Standard, a 1300 page prose document The design is a detailed compromise: performance, optimisations and hardware usability compatibility with the next C, C1X legacy code – p. 2
C++0x: the next C++ Our mathematical model is faithful to the intent of, and has influenced The Standard The model: syntactically separates out expert features has a weak memory defines a happens-before relation requires non-atomic reads and writes to be DRF provides atomic reads and writes for racy programs – p. 3
The syntactic divide An example of the syntax // for regular programmers: atomic_int x = 0; x.store(1); y = x.load(); // for experts: x.store(2, memory_order ); y = x.load( memory_order ); atomic_thread_fence( memory_order ); With a choice of memory order mo_seq_cst mo_release mo_acquire mo_acq_rel mo_consume mo_relaxed – p. 4
A model of two parts An operational semantics: Processes programs, identifying memory actions Constructs candidate executions, E opsem An axiomatic memory model: Judges E opsem paired with a memory ordering, X witness Searches the consistent executions for races and unconstrained reads – p. 5
Judgement of the axiomatic model cpp memory model opsem ( p : program ) = let pre executions = { ( E opsem , X witness ) . opsem p E opsem ∧ consistent execution ( E opsem , X witness ) } in if ∃ X ∈ pre executions . (indeterminate reads X � = {} ) ∨ (unsequenced races X � = {} ) ∨ (data races X � = {} ) then N ONE else S OME pre executions – p. 6
The relations of a pre-execution An E opsem part containing: sb — sequenced before , program order asw — additional synchronizes with , inter-thread ordering dd — data-dependence An X witness part containing: rf — relates a write to any reads that take its value sc — a total order over mo_seq_cst and mutex actions mo — modification order , per location total order of writes – p. 7
A single threaded program a:W na x=2 sb rf b:W na y=0 rf int main() { int x = 2; sb sb int y = 0; y = (x == x); c:R na x=2 d:R na x=2 return 0; } sb sb e:W na y=1 ../examples/t1.c – p. 8
Memory actions action ::= non-atomic read a:R na x=v non-atomic write | a:W na x=v atomic read | a:R mo x=v atomic write | a:W mo x=v atomic read-modify-write | a:RMW mo x=v1/v2 | lock a:L x | unlock a:U x | fence a:F mo – p. 9
Memory orders Memory orders are shown as follows: mo ::= memory order seq cst SC memory order relaxed | RLX memory order release | REL memory order acquire | ACQ memory order consume | CON memory order acq rel | A/R – p. 10
Location kinds location kind = M UTEX | N ON ATOMIC | A TOMIC actions respect location kinds = ∀ a . case location a of S OME l → ( case location-kind l of M UTEX → is lock or unlock a � N ON ATOMIC → is load or store a � A TOMIC → is load or store a ∨ is atomic action a ) � N ONE → T – p. 11
That single threaded program again a:W na x=2 sb rf b:W na y=0 rf int main() { int x = 2; sb sb int y = 0; y = (x == x); c:R na x=2 d:R na x=2 return 0; } sb sb e:W na y=1 ../examples/t1.c – p. 12
Unsequenced race unsequenced races = { ( a , b ) . is load or store a ∧ is load or store b ∧ ( a � = b ) ∧ same location a b ∧ (is write a ∨ is write b ) ∧ same thread a b ∧ sequenced-before sequenced-before ¬ ( a − − − − − − − − − → b ∨ b − − − − − − − − − → a ) } – p. 13
An unsequenced race a:W na x=2 sb int main() { rf b:W na y=0 dummy int x = 2; sb int y = 0; sb ur y = (x == (x=3)); d:R na x=2 dummy c:W na x=3 return 0; } sb sb e:W na y=0 – p. 14
A multi-threaded program void foo(int* p) {*p=3;} int main() { int x = 2; int y; thread t1(foo, &x); y = 3; t1.join(); a:W na x=2 return 0; } becomes: int main() { asw asw b:W na x=3 c:W na y=3 int x = 2; int y; ../examples/t3-parallel.c {{{ x = 3; ||| y = 3; }}} return 0; } – p. 15
Synchronizes-with and happens-before The parent thread has synchronization edges, labeled asw, to its child threads. There are other ways to synchronize. We will define the happens-before relation later. It contains the transitive closure of all synchronization edges and all sequenced before edges (amongst other things). – p. 16
Data race data races = { ( a , b ) . ( a � = b ) ∧ same location a b ∧ (is write a ∨ is write b ) ∧ ¬ same thread a b ∧ ¬ (is atomic action a ∧ is atomic action b ) ∧ happens-before happens-before ¬ ( a − − − − − − − − → b ∨ b − − − − − − − − → a ) } – p. 17
A data race int main() { int x = 2; a:W na x=2 int y; asw asw,rf {{{ x=3; dr dr b:W na x=3 c:R na x=2 ||| y=(x==3); }}}; sb return 0; } d:W na y=0 – p. 18
Modification order A total order of the writes at each atomic location, similar to coherence order on Power a:W na x=0 sb mo int main() { atomic_int x = 0; b:W na y=0 int y = 0; {{{ { x.store(1); asw asw x.store(2); } c:W SC x=1 e:W na y=1 ||| { y = 1; } sb,mo }}} return 0; } d:W SC x=2 ../examples/t70-na-mo.c – p. 19
SC order There is a total order over all sequentially consistent atomic actions. SC atomics read the last prior write in SC order (or a non SC write). consistent sc order = happens-before let sc happens before = →| all sc actions in − − − − − − − − let sc mod order = modification-order →| all sc actions in − − − − − − − − − sc strict total order over all sc actions ( − → ) ∧ sc happens before sc − − − − − − − − − − → ⊆ − → ∧ sc mod order sc − − − − − − − → ⊆ − → – p. 20
Atomic actions do not race a:W SC x=2 int main() { sb rf,sc atomic_int x; b:W na y=0 x.store(2, mo_seq_cst); int y = 0; asw {{{ x.store(3); asw ||| y = ((x.load()) == 3); c:W SC x=3 d:R SC x=2 }}}; sc return 0; } sb e:W na y=0 – p. 21
The release-acquire idiom // sender // receiver x = ... while (0 == y); y = 1; r = x; a:W na x=1 sb b:W REL y=1 sw c:R ACQ y=1 sb d:R na x=1 ../examples/t15.c – p. 22
Release-acquire synchronization a:W na x=1 sb b:W REL y=1 sb,mo,rs sw c:W RLX y=2 rf d:R ACQ y=2 sb e:R na x=1 ../examples/t8a.c – p. 23
The release sequence The release sequence is a sub-sequence of the the modification order following a release rs element rs head a = same thread a rs head ∨ is atomic rmw a release-sequence − − − − − − − − − → b = a rel is at atomic location b ∧ is release a rel ∧ ( ( b = a rel ) ∨ modification-order (rs element a rel b ∧ a rel − − − − − − − − − → b ∧ modification-order modification-order ( ∀ c . a rel − − − − − − − − − → c − − − − − − − − − → b = ⇒ rs element a rel c ))) – p. 24
An execution with a release sequence a:W na x=1 sb b:W REL y=1 sb,mo,rs c:W RLX y=2 rf d:R ACQ y=2 sb e:R na x=1 ../examples/t8a-no-sw.c – p. 25
Synchronizes-with synchronizes-with − − − − − − − − − → b = a (* – additional synchronization, from thread create etc. – *) additional-synchronized-with − − − − − − − − − − − − − − − → b ∨ a (same location a b ∧ a ∈ actions ∧ b ∈ actions ∧ ( (* – mutex synchronization – *) sc (is unlock a ∧ is lock b ∧ a − → b ) ∨ (* – release/acquire synchronization – *) (is release a ∧ is acquire b ∧ ¬ same thread a b ∧ release-sequence rf ( ∃ c . a − − − − − − − − − → c − → b )) ∨ [ . . . ])) – p. 26
Release-acquire synchronization a:W na x=1 sb b:W REL y=1 sb,mo,rs sw c:W RLX y=2 rf d:R ACQ y=2 sb e:R na x=1 ../examples/t8a.c – p. 27
Happens-before (without consume) simple happens before − − − − − − − − − − − − → = sequenced-before synchronizes-with → ) + ( − − − − − − − − − → ∪ − − − − − − − − − consistent simple happens before = simple happens before irreflexive ( − − − − − − − − − − − − → ) – p. 28
Recommend
More recommend