Can Seqlocks Get Along with Programming Language Memory Models? - PowerPoint PPT Presentation

Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. Boehm HP Labs Hans-J. Boehm: Seqlocks 1

The setting • Want fast reader-writer locks – Locking in shared (read) mode allows concurrent access by other readers. – Locking in exclusive (write) mode disallows concurrent readers or writers. • Many more readers than writers – We’ll ignore write performance. • Implementation language: C++11/C11, Java Hans-J. Boehm: Seqlocks 2

Traditional reader-writer locks Multiple readers: Core 1: Core 2: rwl.lock_shared(); r1 = data1; rwl.lock_shared(); r2 = data2; r1 = data1; rwl.unlock_shared(); r2 = data2; rwl.unlock_shared(); rwl.lock_shared(); r1 = data1; Update lock state! r2 = data2; rwl.unlock_shared(); Hans-J. Boehm: Seqlocks 3

Cache lines needed Multiple readers: Core 1: Core 2: rwl.lock_shared(); r1 = data1; rwl.lock_shared(); r2 = data2; r1 = data1; rwl.unlock_shared(); r2 = data2; rwl.unlock_shared(); rwl.lock_shared(); r1 = data1; r2 = data2; rwl.unlock_shared(); excl. shared shared shared shared Hans-J. Boehm: Seqlocks 4

Cache lines needed Multiple readers: Core 1: Core 2: rwl.lock_shared(); r1 = data1; rwl.lock_shared(); r2 = data2; r1 = data1; rwl.unlock_shared(); r2 = data2; rwl.unlock_shared(); rwl.lock_shared(); r1 = data1; r2 = data2; rwl.unlock_shared(); shared shared excl. shared shared Hans-J. Boehm: Seqlocks 6

Cache lines needed Multiple readers: Core 1: Core 2: rwl.lock_shared(); r1 = data1; rwl.lock_shared(); r2 = data2; r1 = data1; rwl.unlock_shared(); r2 = data2; rwl.unlock_shared(); rwl.lock_shared(); r1 = data1; r2 = data2; rwl.unlock_shared(); shared shared excl. shared shared Hans-J. Boehm: Seqlocks 9

Seqlocks • One common solution to this problem. • Used in Linux kernel, jsr166e SequenceLock . • Similar techniques used for e.g. software transactional memory implementations. • Readers don’t update a lock data structure. – Check whether writer interfered. – If so, start over … Hans-J. Boehm: Seqlocks 11

Seqlocks, version 0 (naïve, broken) atomic<unsigned long> seq(0); int data1, data2; void writer(...) { T reader() { unsigned seq0 = seq; int r1, r2; while (seq0 & 1 || unsigned seq0, seq1; !seq.cmp_exc_wk do { (seq0,seq0+1)) seq0 = seq; { seq0 = seq; } r1 = data1; data1 = ...; r2 = data2; data2 = ...; seq1 = seq; seq = seq0 + 2; } while (seq0 != seq1 } || seq0 & 1); do something with r1 and r2; } C++11 version, slightly abbrvd. For Java, use j.u.c.atomic . Hans-J. Boehm: Seqlocks 12

Problem: Data races atomic<unsigned long> seq(0); int data1, data2; void writer(...) { T reader() { unsigned seq0 = seq; int r1, r2; while (seq0 & 1 || unsigned seq0, seq1; !seq.cmp_exc_wk do { (seq0,seq0+1)) seq0 = seq; { seq0 = seq; } r1 = data1; data1 = ...; r2 = data2; data2 = ...; seq1 = seq; seq = seq0 + 2; } while (seq0 != seq1 } || seq0 & 1); do something with r1 and r2; } Hans-J. Boehm: Seqlocks 13

Problem: Data races atomic<unsigned long> seq(0); int data1, data2; void writer(...) { T reader() { unsigned seq0 = seq; int r1, r2; while (seq0 & 1 || unsigned seq0, seq1; !seq.cmp_exc_wk do { (seq0,seq0+1)) seq0 = seq; { seq0 = seq; } r1 = data1; data1 = ...; r2 = data2; data2 = ...; seq1 = seq; seq = seq0 + 2; } while (seq0 != seq1 } || seq0 & 1); do something with r1 and r2; } Hans-J. Boehm: Seqlocks 14

Java version more subtly broken … stay tuned … Hans-J. Boehm: Seqlocks 15

Seqlocks, version 1 (correct) atomic<unsigned long> seq; atomic<int> data1, data2; T reader() { void writer(...) { int r1, r2; unsigned seq0 = seq; unsigned seq0, seq1; while (seq0 & 1 || do { !seq.cmp_exc_wk seq0 = seq; (seq0,seq0+1)); r1 = data1; { seq0 = seq; } r2 = data2; data1 = ...; seq1 = seq; data2 = ...; } while (seq0 != seq1 seq = seq0 + 2; || seq0 & 1); } do something with r1 and r2; No data races  sequential consistency } For Java: volatile int data1, data2 ; Hans-J. Boehm: Seqlocks 16

Are we done? • Bad news: – atomic annotations for data superficially surprising. • B ut really shouldn’t be. • Prevents compiler misoptimization in C and C++. • Provides useful properties, e.g. indivisible loads of long . – Overconstrains read ordering. • forces data loads to become visible in order. • … and sometimes more. – Slows down readers on Power 7 by around a factor of 3. • Good news: – Reasonably straightforward. – Works. – Essentially optimal on X86 and other TSO machines. Hans-J. Boehm: Seqlocks 17

Better portable performance? Seqlocks version 2 (broken, again) atomic<unsigned long> seq(0); atomic<int> data1, data2; T reader() { int r1, r2; unsigned seq0, seq1; (writer unchanged) do { seq0 = seq; r1 = data1.load(m_o_relaxed); r2 = data2.load(m_o_relaxed); seq1 = seq; // m_o_seq_cst load } while (seq0 != seq1 || seq0 & 1); do something with r1 and r2; } Hans-J. Boehm: Seqlocks 18

Seqlocks version 2 (broken, again) • The problem (informally): atomic<unsigned long> seq; atomic<int> data1, data2; – m_o_seq_cst guarantees s.c. T reader() { for programs using only int r1, r2; m_o_seq_cst. unsigned seq0, seq1; do { – load of r2 may become seq0 = seq; r1 = data1.load(m_o_relaxed); visible after load of seq1! r2 = data2.load(m_o_relaxed); – data loads can move out of seq1 = seq; // m_o_seq_cst load } while (seq0 != seq1 “critical section”. || seq0 & 1); do something with r1 and r2; – d.r.f  invisible for data } loads • Explicit ordering is tricky. Java: Same problem with volatile seq , non-volatile data n . Hans-J. Boehm: Seqlocks 19

Using C++11 fences Seqlocks version 3 (correct) atomic<unsigned long> seq; atomic<int> data1, data2; T reader() { Advantage: • Portable performance int r1, r2; unsigned seq0, seq1; (writer unchanged) do { Disadvantages: • Correctness is subtle seq0 = seq.load(m_o_acquire); • Fences overconstrain r1 = data1.load(m_o_relaxed); r2 = data2.load(m_o_relaxed); ordering • Impossible in Java atomic_thread_fence(m_o_acquire); seq1 = seq.load(m_o_relaxed); } while (seq0 != seq1 || seq0 & 1); do something with r1 and r2; } Hans-J. Boehm: Seqlocks 20

Back to read-modify-write operations Seqlocks version 4 (correct) atomic<unsigned long> seq; atomic<int> data1, data2; T reader() { int r1, r2; unsigned seq0, seq1; (writer unchanged) do { seq0 = seq.load(m_o_acquire); r1 = data1.load(m_o_relaxed); r2 = data2.load(m_o_relaxed); seq1 = seq.fetch_and_add(0, m_o_release); } while (seq0 != seq1 || seq0 & 1); do something with r1 and r2; } Hans-J. Boehm: Seqlocks 21

Read- don’t -modify-write operations • Advantages – Seems much more natural: m_o_acquire to acquire “lock”, m_o_release to release lock. – Works with Java and ordinary variables in “critical section”. • Disadvantage: – Reintroduces store to lock and cache-line ping-ponging. • But: – Store can be optimized out, at least on x86, probably on POWER. – Unfortunately, an extra fence remains (see paper). – Probably the best we can do for Java on POWER. Hans-J. Boehm: Seqlocks 22

X86 reader performance final load ~ seq_cst or fence version final fence + load ~ optimized RMW (better than seq.cst. on Power) Hans-J. Boehm: Seqlocks 23

Bottom line: • Version 1 (seq. cst. atomics for data) is easy to write, works with C++ and Java, performs well on some platforms, not others. • Version 3 (fences) is very tricky to write correctly. Should perform well everywhere. Only for C & C++. • Version 4 (read- don’t -modify-write) works everywhere. Scalability depends on currently unimplemented compiler optimization. With optimization: Worse than version 1 on X86, better on POWER. • Version 2 (plain relaxed data) may be quite popular in Java, but is undeserving of its popularity. Hans-J. Boehm: Seqlocks 24

Questions? Hans-J. Boehm: Seqlocks 25

Backup slides Hans-J. Boehm: Seqlocks 26

Can Seqlocks Get Along with Programming Language Memory Models? - PowerPoint PPT Presentation

Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. Boehm HP Labs Hans-J. Boehm: Seqlocks 1 The setting Want fast reader-writer locks Locking in shared (read) mode allows concurrent access by other readers.

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Winning Presentation in a Day Get It Done Right, Get It Winning Presentation in a Day Get It Done

How To Get A New Shirt Make it yourself: How To Get A Shirt Make it yourself: Get if

Winning Presentation in a Day Get It Done Right, Get It Done Fast Winning Presentation in a Day

CS2281: Programming in UNIX Semester 3, 2004/05 CS2281: Programming in UNIX p.1/13 Syllabus

Programming it's hard to do the programming to get something done details are hard to get

Programming translate our algorithm into set of instructions machine can execute Programming

Why can they all just get along? Exploring influences on cross-party cooperation among backbench

Rocky Mountain National Parks We Can All Just Get Along! Sarah Elmeligi Canmore resident

HOW CAN SMALL COUNTRIES GET ALONG IN A BIG WORLD? By Charles W. Littrell Inspector of Banks and

Can Six Sigm a & CMMI Get Along? ( The answ er is Yes!) Keith Lutz keith.lutz@intel.com

CAN driver API - Migration from classical CAN to CAN FD MicroControl CAN driver API -

voice Kate Howland End-user programming? End-user programming? End-user programming?

Hierarchy of Software Complexity Application Programs Sequential Programming Embedded

Programming Styles and Objects Fermilab - TARGET 2018 Week 3 Programming styles Imperative

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Optimization 11/4/2011 Warm up Sketch the graph of f ( x ) = ( x 3)( x 2)( x 1) = x

The role and value of making data inventories a key step towards mature data governance

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Fence-Insertion for Structured Programs Arash Pourdamghani Mohammad Taheri Mohsen Lesani

COLLABORATION In the CrossCountry Toolkit With Helen and Jose Diacono QUICK TOUR OF ZOOM Turn

Odyssey Landscape & y y p Environmental Services, INC. Erosion Mitigation g &

FIDO Trust Requirements Ijlal Loutfi, Audun Jsang University of Oslo Mathematics and Natural

Know Your Data Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Can Seqlocks Get Along with Programming Language Memory Models? - PowerPoint PPT Presentation

Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. Boehm HP Labs Hans-J. Boehm: Seqlocks 1 The setting Want fast reader-writer locks Locking in shared (read) mode allows concurrent access by other readers.

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Winning Presentation in a Day Get It Done Right, Get It Winning Presentation in a Day Get It Done

How To Get A New Shirt Make it yourself: How To Get A Shirt Make it yourself: Get if

Winning Presentation in a Day Get It Done Right, Get It Done Fast Winning Presentation in a Day

CS2281: Programming in UNIX Semester 3, 2004/05 CS2281: Programming in UNIX p.1/13 Syllabus

Programming it's hard to do the programming to get something done details are hard to get

Programming translate our algorithm into set of instructions machine can execute Programming

Why can they all just get along? Exploring influences on cross-party cooperation among backbench

Rocky Mountain National Parks We Can All Just Get Along! Sarah Elmeligi Canmore resident

HOW CAN SMALL COUNTRIES GET ALONG IN A BIG WORLD? By Charles W. Littrell Inspector of Banks and

Can Six Sigm a &amp; CMMI Get Along? ( The answ er is Yes!) Keith Lutz keith.lutz@intel.com

CAN driver API - Migration from classical CAN to CAN FD MicroControl CAN driver API -

voice Kate Howland End-user programming? End-user programming? End-user programming?

Hierarchy of Software Complexity Application Programs Sequential Programming Embedded

Programming Styles and Objects Fermilab - TARGET 2018 Week 3 Programming styles Imperative

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Optimization 11/4/2011 Warm up Sketch the graph of f ( x ) = ( x 3)( x 2)( x 1) = x

The role and value of making data inventories a key step towards mature data governance

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Fence-Insertion for Structured Programs Arash Pourdamghani Mohammad Taheri Mohsen Lesani

COLLABORATION In the CrossCountry Toolkit With Helen and Jose Diacono QUICK TOUR OF ZOOM Turn

Odyssey Landscape &amp; y y p Environmental Services, INC. Erosion Mitigation g &amp;

FIDO Trust Requirements Ijlal Loutfi, Audun Jsang University of Oslo Mathematics and Natural

Know Your Data Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Can Six Sigm a & CMMI Get Along? ( The answ er is Yes!) Keith Lutz keith.lutz@intel.com

Odyssey Landscape & y y p Environmental Services, INC. Erosion Mitigation g &