1. support compiler optimizations 2. provide effjcient compilation to hardware 3. have easy non-expert mode Requirements to (Weak) Memory Models Hardware MMs should [x86, Power, ARM, RISC-V] 1. describe real CPUs 2. save room for future optimizations 3. provide reasonable guarantees for PLs Programming languages’ MMs should [C/C++, Java, JS, Wasm, OCaml] 6
Requirements to (Weak) Memory Models Hardware MMs should [x86, Power, ARM, RISC-V] 1. describe real CPUs 2. save room for future optimizations 3. provide reasonable guarantees for PLs Programming languages’ MMs should [C/C++, Java, JS, Wasm, OCaml] 1. support compiler optimizations 2. provide effjcient compilation to hardware 3. have easy non-expert mode 6
Requirements to (Weak) Memory Models Hardware MMs should [x86, Power, ARM, RISC-V] 1. describe real CPUs 2. save room for future optimizations 3. provide reasonable guarantees for PLs Programming languages’ MMs should [C/C++, Java, JS, Wasm, OCaml] 1. support compiler optimizations 2. provide effjcient compilation to hardware 3. have easy non-expert mode 6
a y y 1 Optimized x 1 b x 1. Compiler optimizations [ x ] := 1 ; [ y ] := 1 ; Source a := [ y ]; b := [ x ]; 7
a y y 1 Optimized x 1 b x 1. Compiler optimizations [ x ] := 1 ; [ y ] := 1 ; Source a := [ y ]; b := [ x ]; 7
1. Compiler optimizations [ x ] := 1 ; [ y ] := 1 ; Source a := [ y ]; b := [ x ]; a := [ y ]; [ y ] := 1 ; Optimized [ x ] := 1 ; b := [ x ]; 7
1. Compiler optimizations [ x ] := 1 ; [ y ] := 1 ; Source a := [ y ]; b := [ x ]; ⊆ a := [ y ]; [ y ] := 1 ; Optimized [ x ] := 1 ; b := [ x ]; 7
2. Effjcient compilation to hardware [ x ] := 1 ; [ y ] := 1 ; Source MM (SC) a := [ y ]; b := [ x ]; [ x ] := 1 ; [ y ] := 1 ; Target MM (x86) mfence ; mfence ; a := [ y ]; b := [ x ]; 8
2. Effjcient compilation to hardware [ x ] := 1 ; [ y ] := 1 ; Source MM (SC) a := [ y ]; b := [ x ]; No compilation scheme w/o fences [ x ] := 1 ; [ y ] := 1 ; Target MM (x86) mfence ; mfence ; a := [ y ]; b := [ x ]; 8
D ata- R ace- F reedom guarantee: a x b y if a then if b then y 1 x 1 C/C++ MM allows to get a b 1 1 is O ut- O f- T hin- A ir outcome a b 3. Easy non-expert mode Nice program ⇒ nice behaviors 9
D ata- R ace- F reedom guarantee: a x b y if a then if b then y 1 x 1 C/C++ MM allows to get a b 1 1 is O ut- O f- T hin- A ir outcome a b 3. Easy non-expert mode No data races ⇒ only SC behaviors 9
D ata- R ace- F reedom guarantee: a x b y if a then if b then y 1 x 1 C/C++ MM allows to get a b 1 1 is O ut- O f- T hin- A ir outcome a b 3. Easy non-expert mode No data races in SC executions ⇒ only SC behaviors 9
a x b y if a then if b then y 1 x 1 C/C++ MM allows to get a b 1 1 is O ut- O f- T hin- A ir outcome a b 3. Easy non-expert mode D ata- R ace- F reedom guarantee: No data races in SC executions ⇒ only SC behaviors 9
C/C++ MM allows to get a b 1 1 is O ut- O f- T hin- A ir outcome a b 3. Easy non-expert mode D ata- R ace- F reedom guarantee: No data races in SC executions ⇒ only SC behaviors a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 9
1 is O ut- O f- T hin- A ir outcome a b 3. Easy non-expert mode D ata- R ace- F reedom guarantee: No data races in SC executions ⇒ only SC behaviors a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 C/C++ MM allows to get a = b = 1 9
3. Easy non-expert mode D ata- R ace- F reedom guarantee: No data races in SC executions ⇒ only SC behaviors a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 C/C++ MM allows to get a = b = 1 a = b = 1 is O ut- O f- T hin- A ir outcome 9
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 10
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 11
Validity of transformations [Ševčík and Aspinall, 2008] JMM SC Trace-preserving transformations ✓ Reordering normal memory accesses ✗ Redundant read after read elimination ✓ Redundant read after write elimination ✓ Irrelevant read elimination ✓ Irrelevant read introduction ✓ Redundant write before write elimination ✓ Redundant write after read elimination ✓ External action reordering ✗ 12
Drawbacks: Hardware still allows weak behaviors, i.e., no end-to-end SC Requires modifying existing compilers SC-preserving optimizations in LLVM [Marino et al., 2011] Average slowdown: ▶ 34% w/ only SC preserving optimizations ▶ 5.5% w/ optimizations modifjed to preserve SC 13
SC-preserving optimizations in LLVM [Marino et al., 2011] Average slowdown: ▶ 34% w/ only SC preserving optimizations ▶ 5.5% w/ optimizations modifjed to preserve SC Drawbacks: ▶ Hardware still allows weak behaviors, i.e., no end-to-end SC ▶ Requires modifying existing compilers 13
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 14
Validity of transformations [Ševčík and Aspinall, 2008] SC JMM ∗ Trace-preserving transformations ✓ ✓ Reordering normal memory accesses ✓ ∗ ✗ Redundant read after read elimination ✓ ✗ Redundant read after write elimination ✓ ✓ Irrelevant read elimination ✓ ✓ Irrelevant read introduction ✓ ✗ Redundant write before write elimination ✓ ✓ Redundant write after read elimination ✓ ✗ External action reordering ✗ ✗ 15
Validity of transformations [Ševčík and Aspinall, 2008] SC JMM ∗ Trace-preserving transformations ✓ ✓ Reordering normal memory accesses ✓ ∗ ✗ Redundant read after read elimination ✓ ✗ Redundant read after write elimination ✓ ✓ Irrelevant read elimination ✓ ✓ Irrelevant read introduction ✓ ✗ Redundant write before write elimination ✓ ✓ Redundant write after read elimination ✓ ✗ External action reordering ✗ ✗ 15
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 16
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 16
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 16
End-to-end SC via Volatile JVM [Liu et al., 2017, Liu et al., 2019] Java MM guarantees D ata- R ace- F reedom: Shared locations are volatile (no data races) ⇒ SC semantics 17
28 79 81 164 57 85 157 73 125 103 End-to-end SC via Volatile JVM [Liu et al., 2017, Liu et al., 2019] Benchmarks Slowdown, in % DaCapo spark-perf x86 Average Max ARM (1) Average Max ARM (2) Average Max 17
57 85 157 73 125 103 End-to-end SC via Volatile JVM [Liu et al., 2017, Liu et al., 2019] Benchmarks Slowdown, in % DaCapo spark-perf x86 Average 28 79 Max 81 164 ARM (1) Average Max ARM (2) Average Max 17
73 125 103 End-to-end SC via Volatile JVM [Liu et al., 2017, Liu et al., 2019] Benchmarks Slowdown, in % DaCapo spark-perf x86 Average 28 79 Max 81 164 ARM (1) Average 57 85 Max 157 ∞ ARM (2) Average Max 17
End-to-end SC via Volatile JVM [Liu et al., 2017, Liu et al., 2019] Benchmarks Slowdown, in % DaCapo spark-perf x86 Average 28 79 Max 81 164 ARM (1) Average 57 85 Max 157 ∞ ARM (2) Average 73 125 Max 103 ∞ 17
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 18
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 18
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 18
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 19
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 19
C/C++ MM allows to get a = b = 1, OOTA a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 20
a 0 b 0 a 0 b 1 a 1 b 1 R y 0 R y 1 R y 1 R x 0 R x 0 R x 1 W y 1 W y 1 W y 1 W x 1 W x 1 Axioms: 1. po rf preserved is acyclic ( rf preserved rf ) 2. … rf po po po rf po po Executions in C/C++ MM a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 21
a 0 b 0 a 0 b 1 a 1 b 1 R y 1 R y 1 R x 0 R x 1 W y 1 W y 1 W x 1 W x 1 Axioms: 1. po rf preserved is acyclic ( rf preserved rf ) 2. … po po rf po rf po po Executions in C/C++ MM a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 R y 0 R x 0 W y 1 21
a 0 b 1 a 1 b 1 R y 1 R y 1 R x 0 R x 1 W y 1 W y 1 W x 1 W x 1 Axioms: 1. po rf preserved is acyclic ( rf preserved rf ) 2. … po rf po rf po po po Executions in C/C++ MM a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 // a = 0 ; b = 0 R y 0 R x 0 W y 1 21
a 1 b 1 R y 1 R x 1 W y 1 W x 1 Axioms: 1. po rf preserved is acyclic ( rf preserved rf ) 2. … rf rf po po po po po Executions in C/C++ MM a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 // a = 0 ; b = 0 // a = 0 ; b = 1 R y 0 R y 1 R x 0 R x 0 W y 1 W y 1 W x 1 21
Axioms: 1. po rf preserved is acyclic ( rf preserved rf ) 2. … po rf po po rf po po Executions in C/C++ MM a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 // a = 0 ; b = 0 // a = 0 ; b = 1 // a = 1 ; b = 1 R y 0 R y 1 R y 1 R x 0 R x 0 R x 1 W y 1 W y 1 W y 1 W x 1 W x 1 21
po rf rf po po po po Executions in C/C++ MM a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 // a = 0 ; b = 0 // a = 0 ; b = 1 // a = 1 ; b = 1 R y 0 R y 1 R y 1 R x 0 R x 0 R x 1 W y 1 W y 1 W y 1 W x 1 W x 1 Axioms: 1. po ∪ rf preserved is acyclic ( rf preserved ⊆ rf ) 2. … 21
fake ctrl a x b y R y 1 R x 1 if a then if b then y 1 x 1 W y 1 W x 1 y 1 ctrl else ctrl rf ctrl ctrl rf ctrl rf Out-Of-Thin-Air in C/C++ MM R y 1 R x 1 a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 W y 1 W x 1 R y 1 R x 1 a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 W y 1 W x 1 22
fake ctrl a x b y R y 1 R x 1 if a then if b then y 1 x 1 W y 1 W x 1 y 1 ctrl else ctrl rf ctrl ctrl rf ctrl rf Out-Of-Thin-Air in C/C++ MM R y 1 R x 1 a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 W y 1 W x 1 R y 1 R x 1 a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 W y 1 W x 1 22
fake ctrl R y 1 R x 1 W y 1 W x 1 ctrl ctrl ctrl else rf ctrl ctrl rf rf Out-Of-Thin-Air in C/C++ MM R y 1 R x 1 a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 W y 1 W x 1 R y 1 R x 1 a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 W y 1 W x 1 a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 [ y ] := 1 22
fake ctrl ctrl rf ctrl ctrl else rf ctrl ctrl rf Out-Of-Thin-Air in C/C++ MM R y 1 R x 1 a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 W y 1 W x 1 R y 1 R x 1 a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 W y 1 W x 1 a := [ x ]; b := [ y ]; R y 1 R x 1 if a then if b then [ y ] := 1 [ x ] := 1 W y 1 W x 1 [ y ] := 1 22
ctrl rf ctrl ctrl else rf ctrl ctrl rf Out-Of-Thin-Air in C/C++ MM R y 1 R x 1 a := [ x ]; b := [ y ]; [ y ] := 1 if b then [ x ] := 1 W y 1 W x 1 R y 1 R x 1 a := [ x ]; b := [ y ]; if a then if b then [ y ] := 1 [ x ] := 1 W y 1 W x 1 a := [ x ]; b := [ y ]; R y 1 R x 1 if a then if b then fake ctrl [ y ] := 1 [ x ] := 1 W y 1 W x 1 [ y ] := 1 22
Simplicity No UB RC11 [Lahav et al., 2017] Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] 23
Simplicity No UB Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] RC11 [Lahav et al., 2017] Forbids all po ∪ rf cycles 24
since hardware respects rf How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! po po po rf po W rf po po R W R rf po rf po po rf rf rf rf rf po rf po po po po po Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] 25
since hardware respects rf How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! rf rf R W R W po rf po po rf po po po rf po po po po po rf po rf po rf rf po Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] ( po ∪ rf ) ∗ 25
since hardware respects rf How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! rf po rf po po R po W rf W R po po po po po po po rf rf rf rf po Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] ( po ∪ rf ) ∗ rf \ po rf \ po 25
since hardware respects rf How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! po R W R W po po rf po po po rf po po rf po rf po po rf rf rf Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] ( po ∪ rf \ po ) ∗ rf \ po rf \ po 25
since hardware respects rf How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! po R W R W po po rf po po po rf po po rf po rf po po rf rf rf Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] ( po ∪ rf \ po ) ∗ rf \ po rf \ po 25
How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! po R W R W po po rf po po po rf po po rf po rf rf po rf rf Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] since hardware respects rf \ po ( po ∪ rf \ po ) ∗ rf \ po rf \ po 25
Cheaper for C/C++ than for Java! po R W R W po po rf po po po po rf po rf po rf po rf rf rf Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] since hardware respects rf \ po ( po ∪ rf \ po ) ∗ rf \ po rf \ po How? 1. Restrict compiler optimizations 2. Put a fence between R and W 25
po po R W R W po po rf po po po po rf rf rf rf rf po rf po Forbidding po ∪ rf cycles Enough to respect [ R ] ; po ; [ W ] since hardware respects rf \ po ( po ∪ rf \ po ) ∗ rf \ po rf \ po How? 1. Restrict compiler optimizations 2. Put a fence between R and W Cheaper for C/C++ than for Java! 25
C/C++ has undefjned behavior 26
subject to OOTA int data int data 0 0 atomic< int > f f 0 0 f acq 0 f rel 1 Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior access to int relaxed ( rlx ) access to atomic<int> int while volatile int atomic<int> Undefjned Behavior and Memory Models [ data ] := 42 ; while ([ f ] == 0 ) {} ; [ f ] := 1 ; print ([ data ]); 27
subject to OOTA int data 0 atomic< int > f 0 f acq 0 f rel 1 Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior access to int relaxed ( rlx ) access to atomic<int> int while volatile int atomic<int> Undefjned Behavior and Memory Models int data = 0 ; f = 0 ; [ data ] := 42 ; while ([ f ] == 0 ) {} ; [ f ] := 1 ; print ([ data ]); 27
subject to OOTA int data 0 atomic< int > f 0 f acq 0 f rel 1 Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior access to int relaxed ( rlx ) access to atomic<int> while volatile int int atomic<int> Undefjned Behavior and Memory Models int data = 0 ; f = 0 ; [ data ] := 42 ; while ([ f ] == 0 ) {} ; [ f ] := 1 ; print ([ data ]); Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! 27
subject to OOTA int data 0 f 0 f acq 0 f rel 1 Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior access to int relaxed ( rlx ) access to atomic<int> int while volatile int atomic<int> Undefjned Behavior and Memory Models int data = 0 ; atomic< int > f = 0 ; [ data ] := 42 ; while ([ f ] == 0 ) {} ; [ f ] := 1 ; print ([ data ]); 27
subject to OOTA int data 0 f 0 f 0 f 1 Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior access to int relaxed ( rlx ) access to atomic<int> int while atomic<int> volatile int Undefjned Behavior and Memory Models int data = 0 ; atomic< int > f = 0 ; while ([ f ] acq == 0 ) {} ; [ data ] := 42 ; [ f ] rel := 1 ; print ([ data ]); 27
int data 0 f 0 f 0 f 1 Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! subject to OOTA int while atomic<int> volatile int Undefjned Behavior and Memory Models int data = 0 ; atomic< int > f = 0 ; while ([ f ] acq == 0 ) {} ; [ data ] := 42 ; [ f ] rel := 1 ; print ([ data ]); Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior access to int relaxed ( rlx ) access to atomic<int> 27
int data 0 f 0 f 0 f 1 Java: Fine, but may print 0 C/C++: Undefjned Behavior! Race on normal location! while volatile int int atomic<int> Undefjned Behavior and Memory Models int data = 0 ; atomic< int > f = 0 ; while ([ f ] acq == 0 ) {} ; [ data ] := 42 ; [ f ] rel := 1 ; print ([ data ]); Java MM C/C++ MM special locations data race on int weak guarantees undefjned behavior subject to OOTA access to int relaxed ( rlx ) access to atomic<int> 27
Simplicity No UB Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] RC11 [Lahav et al., 2017] Forbids all po ∪ rf cycles 28
Simplicity Promising [Kang et al., 2017, Lee et al., 2020] Weakestmo [Chakraborty and Vafeiadis, 2019] Modular Relaxed Dep. [Paviotti et al., 2020] OCaml MM [Dolan et al., 2018] Thank you! http://podkopaev.net Programming languages’ MM to Hardware Comp. Opt. (No OOTA) Efg. Comp. No UB DRF SC [Lamport, 1979] Java MM [Manson et al., 2005] C/C++ MM [Batty et al., 2011] RC11 [Lahav et al., 2017] Forbids all po ∪ rf cycles 28
To forbid po ∪ rf cycles in C/C++ enough to respect [ R ] ; po ; [ W ] on atomics 29
ARMv8: bogus conditional branch for relaxed atomic reads No changes for LLVM x86: no fences 1. Restrict compiler optimizations: 2. Put a fence between R and W Slowdown on ARMv8 is 0% on average and 6.3% max CDS from CDS C++, Folly, Junction, Rigtorp libs and 6 bechmarks from CDSSpec Preserving [ R ] ; po ; [ W ] for atomics in LLVM [Ou and Demsky, 2018] 30
ARMv8: bogus conditional branch for relaxed atomic reads No changes for LLVM x86: no fences Slowdown on ARMv8 is 0% on average and 6.3% max CDS from CDS C++, Folly, Junction, Rigtorp libs and 6 bechmarks from CDSSpec Preserving [ R ] ; po ; [ W ] for atomics in LLVM [Ou and Demsky, 2018] 1. Restrict compiler optimizations: 2. Put a fence between R and W 30
ARMv8: bogus conditional branch for relaxed atomic reads x86: no fences Slowdown on ARMv8 is 0% on average and 6.3% max CDS from CDS C++, Folly, Junction, Rigtorp libs and 6 bechmarks from CDSSpec Preserving [ R ] ; po ; [ W ] for atomics in LLVM [Ou and Demsky, 2018] 1. Restrict compiler optimizations: No changes for LLVM 2. Put a fence between R and W 30
ARMv8: bogus conditional branch for relaxed atomic reads Slowdown on ARMv8 is 0% on average and 6.3% max CDS from CDS C++, Folly, Junction, Rigtorp libs and 6 bechmarks from CDSSpec Preserving [ R ] ; po ; [ W ] for atomics in LLVM [Ou and Demsky, 2018] 1. Restrict compiler optimizations: No changes for LLVM 2. Put a fence between R and W ▶ x86: no fences 30
Slowdown on ARMv8 is 0% on average and 6.3% max CDS from CDS C++, Folly, Junction, Rigtorp libs and 6 bechmarks from CDSSpec Preserving [ R ] ; po ; [ W ] for atomics in LLVM [Ou and Demsky, 2018] 1. Restrict compiler optimizations: No changes for LLVM 2. Put a fence between R and W ▶ x86: no fences ▶ ARMv8: bogus conditional branch for relaxed atomic reads 30
Preserving [ R ] ; po ; [ W ] for atomics in LLVM [Ou and Demsky, 2018] 1. Restrict compiler optimizations: No changes for LLVM 2. Put a fence between R and W ▶ x86: no fences ▶ ARMv8: bogus conditional branch for relaxed atomic reads Slowdown on ARMv8 is 0% on average and 6.3% max CDS from CDS C++, Folly, Junction, Rigtorp libs and 6 bechmarks from CDSSpec 30
Recommend
More recommend