NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016

Lecture 7  Linearizability  Lock-free progress properties  Queues  Reducing contention  Explicit memory management

Linearizability 3

More generally  Suppose we build a shared-memory data structure directly from read/write/CAS, rather than using locking as an intermediate layer Data structure Data structure Locks H/W primitives: read, H/W primitives: read, write, CAS, ... write, CAS, ...  Why might we want to do this?  What does it mean for the data structure to be correct? 4

What we’re building  A set of integers, represented by a sorted linked list  find(int) -> bool  insert(int) -> bool  delete(int) -> bool 5

Searching a sorted list  find(20): 20? 30 H 10 T find(20) -> false 6

Inserting an item with CAS  insert(20): 30  20  30 H 10 T 20 insert(20) -> true 7

Inserting an item with CAS  insert(20): • insert(25): 30  25  30  20  30 H 10 T 20 25 8

Searching and finding together -> false • insert(20)  find(20) -> true ...but this thread 20? This thread saw 20 succeeded in putting was not in the set... it in! 30 H 10 T • Is this a correct implementation of a set? 20 • Should the programmer be surprised if this happens? • What about more complicated mixes of operations? 9

Correctness criteria Informally: Look at the behaviour of the data structure (what operations are called on it, and what their results are). If this behaviour is indistinguishable from atomic calls to a sequential implementation then the concurrent implementation is correct. 10

Sequential specification  Ignore the list for the moment, and focus on the set: 10, 20, 30 Sequential: we’re only Specification: we’re saying what considering one operation a set does, not what a list does, on the set at a time or how it looks in memory insert(15)->true 10, 15, 20, 30 find(int) -> bool insert(int) -> bool delete(20)->true insert(20)->false delete(int) -> bool 10, 15, 30 10, 15, 20, 30 11

System model High-level operation Lookup(20) Insert(15) time H H->10 H H->10 New 10->20 CAS True True Primitive step (read/write/CAS) 12

High level: sequential history • No overlapping invocations: T1: insert(10) T2: insert(20) T1: find(15) -> false -> true -> true time 10 10, 20 10, 20 13

High level: concurrent history • Allow overlapping invocations: insert(10)->true insert(20)->true Thread 1: time Thread 2: find(20)->false 14

Linearizability • Is there a correct sequential history: • Same results as the concurrent one • Consistent with the timing of the invocations/responses? 15

Example: linearizable insert(10)->true insert(20)->true Thread 1: time Thread 2: A valid sequential find(20)->false history: this concurrent execution is OK 16

Example: linearizable insert(10)->true delete(10)->true Thread 1: time Thread 2: A valid sequential find(10)->false history: this concurrent execution is OK 17

Example: not linearizable insert(10)->true insert(10)->false Thread 1: time Thread 2: delete(10)->true 18

Returning to our example • insert(20) -> true • find(20) -> false 20? 30 H 10 T A valid sequential history: 20 this concurrent execution is OK find(20)->false Thread 1: Thread 2: insert(20)->true 19

Recurring technique  For updates:  Perform an essential step of an operation by a single atomic instruction  E.g. CAS to insert an item into a list  This forms a “linearization point”  For reads:  Identify a point during the operation’s execution when the result is valid  Not always a specific instruction 20

Adding “delete”  First attempt: just use CAS delete(10): 10  30  30 H 10 T 21

Delete and insert:  delete(10) & insert(20): 30  20  10  30  30 H 10 T  20 22

Logical vs physical deletion  Use a ‘spare’ bit to indicate logically deleted nodes:     10  30 30  30X 30 H 10 T 30  20  20 23

Delete-greater-than-or-equal deleteany() -> int 10, 20, 30 deleteany()->10 deleteany()->20 20, 30 10, 30 This is still a sequential spec... just not a deterministic one 24

Delete-greater-than-or-equal  DeleteGE(int x) -> int  Remove “x”, or next element above “x” 30 H 10 T • DeleteGE(20) -> 30 H 10 T 25

Does this work: DeleteGE(20) 30 H 10 T 1. Walk down the list, as in a normal delete, find 30 as next-after-20 2. Do the deletion as normal: set the mark bit in 30, then physically unlink 26

Delete-greater-than-or-equal B must be after A (thread order) insert(25)->true insert(30)->false A B Thread 1: time C Thread 2: deleteGE(20)->30 A must be after C C must be after B (otherwise C should (otherwise B should have returned 15) have succeeded) 27

Lock-free progress properties 28

Progress: is this a good “lock - free” list? static volatile int MY_LIST = 0; OK, we’re not calling pthread_mutex_lock... but bool find(int key) { we’re essentially doing the same thing // Wait until list available while (CAS(&MY_LIST, 0, 1) == 1) { } ... // Release list MY_LIST = 0; } 29

“Lock - free”  A specific kind of non-blocking progress guarantee  Precludes the use of typical locks  From libraries  Or “hand rolled”  Often mis-used informally as a synonym for  Free from calls to a locking function  Fast  Scalable 30

“Lock - free”  A specific kind of non-blocking progress guarantee  Precludes the use of typical locks  From libraries  Or “hand rolled”  Often mis-used informally as a synonym for  Free from calls to a locking function  Fast  Scalable The version number mechanism is an example of a technique that is often effective in practice, does not use locks, but is not lock-free in this technical sense 31

Wait-free  A thread finishes its own operation if it continues executing steps Start Start Start time Finish Finish Finish 32

Implementing wait-free algorithms  Important in some significant niches  e.g., in real-time systems with worst-case execution time guarantees  General construction techniques exist (“universal constructions”)  Queuing and helping strategies: everyone ensures oldest operation makes progress  Often a high sequential overhead  Often limited scalability  Fast-path / slow-path constructions  Start out with a faster lock-free algorithm  Switch over to a wait-free algorithm if there is no progress  ...if done carefully, obtain wait-free progress overall  In practice, progress guarantees can vary between operations on a shared object  e.g., wait-free find + lock-free delete 33

Lock-free  Some thread finishes its operation if threads continue taking steps Start Start Start Start time Finish Finish Finish 34

A (poor) lock-free counter int getNext(int *counter) { while (true) { Not wait free: no int result = *counter; guarantee that any if (CAS(counter, result, result+1)) { particular thread will return result; succeed } } } 35

Implementing lock-free algorithms  Ensure that one thread (A) only has to repeat work if some other thread (B) has made “real progress”  e.g., insert(x) starts again if it finds that a conflicting update has occurred  Use helping to let one thread finish another’s work  e.g., physically deleting a node on its behalf 36

Obstruction-free  A thread finishes its own operation if it runs in isolation Start Start time Finish Interference here can prevent any operation finishing 37

A (poor) obstruction-free counter Assuming a very weak int getNext(int *counter) { load-linked (LL) store- while (true) { conditional (SC): LL on int result = LL(counter); one thread will prevent an if (SC(counter, result+1)) { SC on another thread return result; succeeding } } } 38

Building obstruction-free algorithms  Ensure that none of the low-level steps leave a data structure “broken”  On detecting a conflict:  Help the other party finish  Get the other party out of the way  Use contention management to reduce likelihood of live- lock 39

Hashtables and skiplists 40

Hash tables 0 16 24 List of items with hash val modulo 8 == 0 Bucket array: 8 entries in 3 11 example 5 41

Hash tables: Contains(16) 0 16 24 1. Hash 16. Use bucket 0 2. Use normal list operations 3 11 5 42

Hash tables: Delete(11) 0 16 24 3 11 1. Hash 11. Use bucket 3 5 2. Use normal list operations 43

Lessons from this hashtable  Informal correctness argument:  Operations on different buckets don’t conflict: no extra concurrency control needed  Operations appear to occur atomically at the point where the underlying list operation occurs  (Not specific to lock-free lists: could use whole-table lock, or per-list locks, etc.) 44

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016 Lecture 7 Linearizability Lock-free progress properties Queues Reducing contention Explicit memory management Linearizability 3

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 21 November 2014 Lecture 7

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Data Structures 1 / 27 Built-in Data Structures Values can be collected in data structures:

Computer Science 210: Data Structures Fall 2010 Welcome to Data Structures! The class is

Data Structures Topic 12 ADTS, Data Structures, Java Collections S S C A Data Structure

Algorithms and Data Structures: Overview Algorithms and data structures Data Abstraction,

Complex Data Structures 2 Complex Data Structures Arrays, structs, objects How to handle

Data Structures and Java Collections Framework 1 Algorithms and Data Structures

Data Structures and What is a data structure? Algorithms Way of storing data in computer

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Synchronizing Data Structures 1 / 78 Synchronizing Data Structures Overview caches and

16. Dynamic Data Structures A data structure is a particular way of organizing data in a computer

Chapter 19 Data Structures - struct -dynamic memory allocation Data Structures A data structure

Cache-Oblivious and Cache-Aware Algorithms , July 2004 Data Structures , February-March 2002

Data structures in R The base structures R.W. Oldford Preliminaries to find data (and images for

Spatial Data Structures What is it? Data structures that organize geometry in 2D,3D or higher

Data Structures for Disjoint Sets Course: CS 5130 - Advanced Data Structures and Algorithms

CS 310 - Advanced Data Structures and Algorithms Basic Data Structures May 31, 2018 Mohammad