Multicore Programming: C++0x Mark Batty University of Cambridge in - PowerPoint PPT Presentation

Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 – p. 1

C++0x: the next C++ Specified by the C++ Standards Committee Defined in The Standard, a 1300 page prose document The design is a detailed compromise: performance, optimisations and hardware usability compatibility with the next C, C1X legacy code – p. 2

C++0x: the next C++ Our mathematical model is faithful to the intent of, and has influenced The Standard The model: syntactically separates out expert features has a weak memory defines a happens-before relation requires non-atomic reads and writes to be DRF provides atomic reads and writes for racy programs – p. 3

The syntactic divide An example of the syntax // for regular programmers: atomic_int x = 0; x.store(1); y = x.load(); // for experts: x.store(2, memory_order ); y = x.load( memory_order ); atomic_thread_fence( memory_order ); With a choice of memory order mo_seq_cst mo_release mo_acquire mo_acq_rel mo_consume mo_relaxed – p. 4

A model of two parts An operational semantics: Processes programs, identifying memory actions Constructs candidate executions, E opsem An axiomatic memory model: Judges E opsem paired with a memory ordering, X witness Searches the consistent executions for races and unconstrained reads – p. 5

Judgement of the axiomatic model cpp memory model opsem ( p : program ) = let pre executions = { ( E opsem , X witness ) . opsem p E opsem ∧ consistent execution ( E opsem , X witness ) } in if ∃ X ∈ pre executions . (indeterminate reads X � = {} ) ∨ (unsequenced races X � = {} ) ∨ (data races X � = {} ) then N ONE else S OME pre executions – p. 6

The relations of a pre-execution An E opsem part containing: sb — sequenced before , program order asw — additional synchronizes with , inter-thread ordering dd — data-dependence An X witness part containing: rf — relates a write to any reads that take its value sc — a total order over mo_seq_cst and mutex actions mo — modification order , per location total order of writes – p. 7

A single threaded program a:W na x=2 sb rf b:W na y=0 rf int main() { int x = 2; sb sb int y = 0; y = (x == x); c:R na x=2 d:R na x=2 return 0; } sb sb e:W na y=1 ../examples/t1.c – p. 8

Location kinds location kind = M UTEX | N ON ATOMIC | A TOMIC actions respect location kinds = ∀ a . case location a of S OME l → ( case location-kind l of M UTEX → is lock or unlock a � N ON ATOMIC → is load or store a � A TOMIC → is load or store a ∨ is atomic action a ) � N ONE → T – p. 11

That single threaded program again a:W na x=2 sb rf b:W na y=0 rf int main() { int x = 2; sb sb int y = 0; y = (x == x); c:R na x=2 d:R na x=2 return 0; } sb sb e:W na y=1 ../examples/t1.c – p. 12

Unsequenced race unsequenced races = { ( a , b ) . is load or store a ∧ is load or store b ∧ ( a � = b ) ∧ same location a b ∧ (is write a ∨ is write b ) ∧ same thread a b ∧ sequenced-before sequenced-before ¬ ( a − − − − − − − − − → b ∨ b − − − − − − − − − → a ) } – p. 13

An unsequenced race a:W na x=2 sb int main() { rf b:W na y=0 dummy int x = 2; sb int y = 0; sb ur y = (x == (x=3)); d:R na x=2 dummy c:W na x=3 return 0; } sb sb e:W na y=0 – p. 14

A multi-threaded program void foo(int* p) {*p=3;} int main() { int x = 2; int y; thread t1(foo, &x); y = 3; t1.join(); a:W na x=2 return 0; } becomes: int main() { asw asw b:W na x=3 c:W na y=3 int x = 2; int y; ../examples/t3-parallel.c {{{ x = 3; ||| y = 3; }}} return 0; } – p. 15

Synchronizes-with and happens-before The parent thread has synchronization edges, labeled asw, to its child threads. There are other ways to synchronize. We will define the happens-before relation later. It contains the transitive closure of all synchronization edges and all sequenced before edges (amongst other things). – p. 16

Data race data races = { ( a , b ) . ( a � = b ) ∧ same location a b ∧ (is write a ∨ is write b ) ∧ ¬ same thread a b ∧ ¬ (is atomic action a ∧ is atomic action b ) ∧ happens-before happens-before ¬ ( a − − − − − − − − → b ∨ b − − − − − − − − → a ) } – p. 17

A data race int main() { int x = 2; a:W na x=2 int y; asw asw,rf {{{ x=3; dr dr b:W na x=3 c:R na x=2 ||| y=(x==3); }}}; sb return 0; } d:W na y=0 – p. 18

Modification order A total order of the writes at each atomic location, similar to coherence order on Power a:W na x=0 sb mo int main() { atomic_int x = 0; b:W na y=0 int y = 0; {{{ { x.store(1); asw asw x.store(2); } c:W SC x=1 e:W na y=1 ||| { y = 1; } sb,mo }}} return 0; } d:W SC x=2 ../examples/t70-na-mo.c – p. 19

SC order There is a total order over all sequentially consistent atomic actions. SC atomics read the last prior write in SC order (or a non SC write). consistent sc order = happens-before let sc happens before = →| all sc actions in − − − − − − − − let sc mod order = modification-order →| all sc actions in − − − − − − − − − sc strict total order over all sc actions ( − → ) ∧ sc happens before sc − − − − − − − − − − → ⊆ − → ∧ sc mod order sc − − − − − − − → ⊆ − → – p. 20

Atomic actions do not race a:W SC x=2 int main() { sb rf,sc atomic_int x; b:W na y=0 x.store(2, mo_seq_cst); int y = 0; asw {{{ x.store(3); asw ||| y = ((x.load()) == 3); c:W SC x=3 d:R SC x=2 }}}; sc return 0; } sb e:W na y=0 – p. 21

The release-acquire idiom // sender // receiver x = ... while (0 == y); y = 1; r = x; a:W na x=1 sb b:W REL y=1 sw c:R ACQ y=1 sb d:R na x=1 ../examples/t15.c – p. 22

Release-acquire synchronization a:W na x=1 sb b:W REL y=1 sb,mo,rs sw c:W RLX y=2 rf d:R ACQ y=2 sb e:R na x=1 ../examples/t8a.c – p. 23

The release sequence The release sequence is a sub-sequence of the the modification order following a release rs element rs head a = same thread a rs head ∨ is atomic rmw a release-sequence − − − − − − − − − → b = a rel is at atomic location b ∧ is release a rel ∧ ( ( b = a rel ) ∨ modification-order (rs element a rel b ∧ a rel − − − − − − − − − → b ∧ modification-order modification-order ( ∀ c . a rel − − − − − − − − − → c − − − − − − − − − → b = ⇒ rs element a rel c ))) – p. 24

An execution with a release sequence a:W na x=1 sb b:W REL y=1 sb,mo,rs c:W RLX y=2 rf d:R ACQ y=2 sb e:R na x=1 ../examples/t8a-no-sw.c – p. 25

Synchronizes-with synchronizes-with − − − − − − − − − → b = a (* – additional synchronization, from thread create etc. – *) additional-synchronized-with − − − − − − − − − − − − − − − → b ∨ a (same location a b ∧ a ∈ actions ∧ b ∈ actions ∧ ( (* – mutex synchronization – *) sc (is unlock a ∧ is lock b ∧ a − → b ) ∨ (* – release/acquire synchronization – *) (is release a ∧ is acquire b ∧ ¬ same thread a b ∧ release-sequence rf ( ∃ c . a − − − − − − − − − → c − → b )) ∨ [ . . . ])) – p. 26

Release-acquire synchronization a:W na x=1 sb b:W REL y=1 sb,mo,rs sw c:W RLX y=2 rf d:R ACQ y=2 sb e:R na x=1 ../examples/t8a.c – p. 27

Happens-before (without consume) simple happens before − − − − − − − − − − − − → = sequenced-before synchronizes-with → ) + ( − − − − − − − − − → ∪ − − − − − − − − − consistent simple happens before = simple happens before irreflexive ( − − − − − − − − − − − − → ) – p. 28

Multicore Programming: C++0x Mark Batty University of Cambridge in - PowerPoint PPT Presentation

Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 p. 1 C++0x: the next C++ Specified by the C++ Standards Committee Defined in The

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

T-106.5800 Seminar on Software Techniques Seminar on Multicore Programming Multicore Technology

Multicore DSP Architecture and Programming O. Dahl 1 1 Electrical Engineering, Linkping

Multicore Semantics and Programming Tim Harris Peter Sewell Amazon University of Cambridge

Programming a multicore architecture without coherency and atomic operations Jochem Rutgers ,

Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London,

System on Chip C (SoC-C) Efficient programming abstractions for heterogeneous multicore Systems

Programming Tools for Embedded Multicore Jakob Engblom Technical Marketing Manager Simics

Multicore Programming Java Memory Model Jaroslav ev Peter Sewell ck Tim Harris

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Embedded System Programming Multicore ES (Module 40) Yann-Hang Lee Arizona State University

Composable lock-free programming for Multicore OCaml KC Sivaramakrishnan University of OCaml

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

The Multicore Challenge performance? sustainability? affordability? SVP SVP SVP SVP SVP

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

Multicore programming in Haskell Simon Marlow Microsoft Research A concurrent web server

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget Under the

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget, UJF-LIG,

Debugging Multicore & Shared- Memory Embedded Systems Classes 249 & 269 2007 edition

Reagents: lock-free programming for the masses KC Sivaramakrishnan University of OCaml

Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Multicore Programming: C++0x Mark Batty University of Cambridge in - PowerPoint PPT Presentation

Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 p. 1 C++0x: the next C++ Specified by the C++ Standards Committee Defined in The

CS 240A: Shared Memory &amp; Multicore Programming with Cilk++ Multicore and NUMA

T-106.5800 Seminar on Software Techniques Seminar on Multicore Programming Multicore Technology

Multicore DSP Architecture and Programming O. Dahl 1 1 Electrical Engineering, Linkping

Multicore Semantics and Programming Tim Harris Peter Sewell Amazon University of Cambridge

Programming a multicore architecture without coherency and atomic operations Jochem Rutgers ,

Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London,

System on Chip C (SoC-C) Efficient programming abstractions for heterogeneous multicore Systems

Programming Tools for Embedded Multicore Jakob Engblom Technical Marketing Manager Simics

Multicore Programming Java Memory Model Jaroslav ev Peter Sewell ck Tim Harris

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Embedded System Programming Multicore ES (Module 40) Yann-Hang Lee Arizona State University

Composable lock-free programming for Multicore OCaml KC Sivaramakrishnan University of OCaml

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

The Multicore Challenge performance? sustainability? affordability? SVP SVP SVP SVP SVP

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

Multicore programming in Haskell Simon Marlow Microsoft Research A concurrent web server

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget Under the

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget, UJF-LIG,

Debugging Multicore &amp; Shared- Memory Embedded Systems Classes 249 &amp; 269 2007 edition

Reagents: lock-free programming for the masses KC Sivaramakrishnan University of OCaml

Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical &amp; Computer

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

Debugging Multicore & Shared- Memory Embedded Systems Classes 249 & 269 2007 edition

Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer