An out-of-order thread-local semantics for something like volatile - PowerPoint PPT Presentation

An out-of-order thread-local semantics for something like volatile relaxed atomics in C and the problems it highlights Jean Pichon 24th of September 2014

Goal How to avoid “out-of-thin-air” with C11’s relaxed atomics? Remark by Mark Batty: no per-candidate-execution semantics (like the C11 standard) can at the same time allow load buffering r1 = x; r2 = y; y = 42 � x = 42 r1 = 42 ∧ r2 = 42 OK but forbid “out-of-thin-air” behaviour such as load buffering plus data dependencies (“LB+datas”) r1 = x; r2 = y; y = r1 � x = r2 r1 = 42 ∧ r2 = 42 BAD where the value 42 appears “out of thin air”. 2/15

Contribution 1) A thread-local semantics with “the right amount” of out-of-order execution. thread source usual thead-local semantics base LTS out-of-order execution non multi-copy-atomic storage subsystem + derived LTS ( Power ) whole-program semantics 2) And its use to illustrate problems. 3/15

Observation 1 Starting from the program r1 = x; if (r1 == 42) { y = r1 } else { y = 42 } the base semantics gives the base LTS a:Rrlx x=0 c:Rrlx x=1 ... y:Rrlx x=42 ... b:Wrlx y=42 d:Wrlx y=42 ... z:Wrlx y=42 ... The thread-local semantics does not specify what can be read ( � receptivity). 4/15

Observation 2 r1 = x; a:Rrlx x=0 c:Rrlx x=42 if (r1 == 42) { y = r1 } else { y = 42 b:Wrlx y=42 d:Wrlx y=42 } The write to y can be executed before the read from x as ◮ it happens in all the branches of the program; ◮ nothing (in particular not Power “coherence”) forces us to execute the read from x before. 5/15

Observation 3 On the other hand, if the write is to x , then it can’t be executed before the read (because of Power “coherence”): r1 = x; a:Rrlx x=0 c:Rrlx x=42 if (r1 == 42) { x = r1 } else { x = 42 b:Wrlx x=42 d:Wrlx x=42 } 6/15

Observation 4 If the write is not available in all branches of the program, we can’t execute the write before the read: r1 = x; a:Rrlx x=0 c:Rrlx x=42 if (r1 == 42) { y = r1 } else { y = 37 b:Wrlx y=37 d:Wrlx y=42 } 7/15

Idea: ticking Executing the base LTS out-of-order, by ticking sets of edges. Like in the base LTS, we can have W y 42 a:Rrlx x=0 ✔ c:Rrlx x=42 a:Rrlx x=0 ✔ c:Rrlx x=42 a:Rrlx x=0 c:Rrlx x=42 R x 0 { a } { b } b:Wrlx y=42 ✔ d:Wrlx y=42 d:Wrlx y=42 b:Wrlx y=42 d:Wrlx y=42 b:Wrlx y=42 But we can also have W y 42 a:Rrlx x=0 ✔ c:Rrlx x=42 a:Rrlx x=0 c:Rrlx x=42 a:Rrlx x=0 c:Rrlx x=42 R x 0 { b,d } { a } b:Wrlx y=42 ✔ d:Wrlx y=42 ✔ b:Wrlx y=42 ✔ d:Wrlx y=42 ✔ b:Wrlx y=42 d:Wrlx y=42 because the Wrlx y=42 is available in all branches. 8/15

Frontier a:Rrlx x=0 h:Rrlx x=42 b:Rrlx y=0 c:Rrlx y=42 ✔ i:Wrlx x2=42 k:Rrlx y=42 ✔ j:Rrlx y=0 d:Rrlx z=0 f:Rrlx z=42 e:Wrlx x2=42 g:Wrlx x2=42 l:Rrlx z=0 m:Rrlx z=42 9/15

No more out-of-thin-air LB+datas is not problematic anymore: r1 = x; r2 = y; y = r1 � x = r2 yields a:Rrlx x=0 c:Rrlx x=42 a:Rrlx y=0 c:Rrlx y=42 b:Wrlx y=0 d:Wrlx y=42 b:Wrlx x=0 d:Wrlx x=42 = ⇒ no out-of-order execution = ⇒ no out-of-thin-air behaviour 10/15

Problems 11/15

Problem with (thread-local) optimisations each action is executed once (and only once) = ⇒ sort of volatile : no introduction or elimination Jaroslav ˇ Sevˇ c´ ık’s example: r2 = y; c:Rrlx y=42 a:Rrlx y=0 if (r2 == 42) { r3 = y; b:Wrlx x=42 f:Rrlx y=42 d:Rrlx y=0 x = r3 } else { x = 42 g:Wrlx x=42 e:Wrlx x=0 } r2 = y and r3 = y should be mergeable, so that x = 42 is available in both branches. 12/15

Problem with inter-thread optimisations r1 = x; � if (r1 == 0) { r2 = y; y = 42 x = r2 } Value-range analysis can determine x can only contain 0 : � b:Wrlx y=42 � − → a:Rrlx x=0 c:Rrlx x=42 a:Rrlx y=0 c:Rrlx y=42 a:Rrlx x=0 a:Rrlx y=0 c:Rrlx y=42 b:Wrlx y=42 b:Wrlx x=0 d:Wrlx x=42 b:Wrlx x=0 d:Wrlx x=42 = ⇒ out-of-thin-air reappears! 13/15

Problem with thread-locality Variables as representations of data-flow (register variables r ) vs. variables as memory locations (shared variables x ). Escape analysis allows int f(void) { int f(void) { int x = 42; e1; e1; // no x − → g(42); g(x); e2; e2; // no x return 42; return x; } } Optimisations are “automatic” on register variables. Interacts with the problem with intra-thread optimisations: � how much escape analysis? 14/15

Conclusion Out-of-order execution by ticking frontiers a:Rrlx x=0 c:Rrlx x=42 b:Wrlx y=42 ✔ d:Wrlx y=42 ✔ It covers relaxed reads and writes, fences, and non-atomic. It gives the desired results on the “out-of-thin-air test suite”. ...but no optimisations (everything is volatile ). 15/15

This page intentionally left blank.

Ticking A set of edges can be ticked iff it forms a “frontier”: 1. all the edges have the same label; 2. all the edges are unticked; 3. all the edges are “executable” (not blocked by coherence or a fence); 4. in each non-discarded path, there is one (and only one) edge from the set. a:Wrlx z=42 A path is discarded iff one of its edges (necessarily labelled with a read) b:Rrlx x=0 ✔ d:Rrlx x=42 has a ticked sibling edge. c:Wrlx y=42 e:Wrlx y=42 17/15

Problem with inter-thread optimisations, part 2 r1 = x; � if (r1 == 0 || r1 == 42) { r2 = y; y = 42 x = r2 } a:Rrlx x=0 c:Rrlx x=37 d:Rrlx x=42 a:Rrlx x=0 c:Rrlx x=42 e:Wrlx y=42 − → b:Wrlx y=42 b:Wrlx y=42 d:Wrlx y=42 Is this out-of-thin-air? For Java, no. For common sense, maybe... 18/15

An out-of-order thread-local semantics for something like volatile - PowerPoint PPT Presentation

An out-of-order thread-local semantics for something like volatile relaxed atomics in C and the problems it highlights Jean Pichon 24th of September 2014 Goal How to avoid out-of-thin-air with C11s relaxed atomics? Remark by Mark

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

Design of Thread-Safe Classes 1 Topic Outline Thread-Safe Classes Principles Confinement

Synthesizing Commutativity Conditions Kshitij Bansal Eric Koskinen Omer Tripp New York

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

CPL 2016, week 5 Inter-thread collaboration Oleg Batrashev Institute of Computer Science, Tartu,

CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer

Reconnecting Exchange Rate and the General Equilibrium Puzzle by Yu-chin Chen, Ippei Fujiwara and

A Framework for Emulating Non-Volatile Memory Systems with Different Performance Characteristics

NFSv4 ID Status Spencer Shepler shepler@eng.sun.com ID Updates (0406) / Definition

Cryptographic software engineering, part 1 Daniel J. Bernstein This is easy, right? 1. Take

JUST THE MATHS SLIDES NUMBER 13.13 INTEGRATION APPLICATIONS 13 (Second moments of a

Discreet e Volume Computations \ for Polytopes: An Invitation to Ehrhart Theory Matthias Beck

View Volumes Canonical View Volumes Why Canonical View Volumes? University of British Columbia