A Portable Lock-free Bounded Queue Peter Pirkelbauer Reed Milewicz Juan Felipe Gonzalez Computer and Information Sciences University of Alabama at Birmingham Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 1 / 30
Outline Circular Bounded Queue 1 2 Mutual Exclusion Lock-free objects 3 4 Lockfree Circular Bounded Queue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 2 / 30
Circular Bounded Queue Elements int head = 0; int tail = 0; int buf[N]; bool enq( int elem); std::pair< int , bool > deq(); Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 3 / 30
Circular Bounded Queue Elements int head = 0; int tail = 0; int buf[N]; bool enq( int elem); std::pair< int , bool > deq(); Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 3 / 30
Circular Bounded Queue Elements int head = 0; int tail = 0; int buf[N]; bool enq( int elem); std::pair< int , bool > deq(); Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 3 / 30
Circular Bounded Queue Elements int head = 0; int tail = 0; int buf[N]; bool enq( int elem); std::pair< int , bool > deq(); Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 3 / 30
Circular Bounded Queue Elements int head = 0; int tail = 0; int buf[N]; bool enq( int elem); std::pair< int , bool > deq(); Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 3 / 30
Circular Bounded Queue Elements int head = 0; int tail = 0; int buf[N]; bool enq( int elem); std::pair< int , bool > deq(); Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 3 / 30
Synchronization Mechanisms Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 4 / 30
Mutual Exclusion Locks (Mutex) Definition A concurrency control mechanism that allows at most one thread be inside of a critical section. Example lock(mutex) shared memory operations; // critical section unlock(mutex) Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 5 / 30
Circular Bounded Queue - Single Mutex bool enqueue(val) pair<int, bool> dequeue() lock(mutex); pair< int , bool > res( − 1, false ); lock(mutex); // is the data structure full? if (tail != N + head) { // is the data structure empty? // insert the element if (head != tail) { buf[tail % N] = val; // read the element res.first = buf[head % N]; // update the tail res.second = true ; ++tail; } // update the head unlock(mutex); ++head; return true ; } unlock(mutex); return res; Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 6 / 30
Problems with Mutual Exclusion Locks Sojourner Rover (’97) Priority Inversion Deadlock Livelock Diminished Parallelism Termination safety Source: astr.ua.edu Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 7 / 30
Lock-free objects Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 8 / 30
Lock-free objects Definition An object is lock-free if it guarantees that one out of many contending thread makes progress in a finite number of steps. Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 9 / 30
Lockfree Primitives Key insight Utilize atomic operations to manipulate the data Read-Modify-Write Operations on x86 compare-and-swap (CAS) on ARM, PowerPC, Alpha Load-linked / Store-conditional (LL/SC) Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 10 / 30
Semantics of Compare-and-swap C++ Interface bool atomic<T>::compare_exchange_strong(T& oldval, T newval) Definition // executes atomically bool atomic<T>::compare_exchange_strong(T& oldval, T newval) { if (oldval == ∗ this ) { ∗ this = newval; return true ; } oldval = ∗ this ; return false ; } Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 11 / 30
Circular Bounded Queue - Hybrid enqueue(val) pair<int, bool> dequeue() lock(mutex); pair< int , bool > res( − 1, false ); size_t oldhead = head; // is the data structure full? if (tail != N + head) { while (oldhead != tail) { // insert the element // store the element away buf[tail % N] = val; res.first = buf[oldhead % N]; // update the tail // test if successful ++tail; if (head.CAS(oldhead, oldhead+1)) { } res.second = true ; unlock(mutex); break ; return true ; } } return res; Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 12 / 30
Circular Bounded Queue - Hybrid enqueue(val) pair<int, bool> dequeue() lock(mutex); pair< int , bool > res( − 1, false ); size_t oldhead = head; // is the data structure full? if (tail != N + head) { while (oldhead != tail) { // insert the element // store the element away buf[tail % N] = val; res.first = buf[oldhead % N]; // update the tail // test if successful ++tail; if (head.CAS(oldhead, oldhead+1)) { } res.second = true ; unlock(mutex); break ; return true ; } } return res; Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 12 / 30
Lockfree Circular Bounded Queue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 13 / 30
Circular Bounded Queue - Unique empty values Problem enqueue needs to update tail and buf[tail] . Our Solution Unique empty values decouple updates Distinguish special values (1 bit) Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 14 / 30
Circular Bounded Queue - Unique empty values Problem enqueue needs to update tail and buf[tail] . Our Solution Unique empty values decouple updates Distinguish special values (1 bit) Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 14 / 30
Circular Bounded Queue - Nonblocking Enqueue enqueue(T) size_t pos = tail; while (pos < head) { atomic<T>& e = buf[idx(pos)].val; ++pos; value_type empty = emptyVal(pos); bool succ = e.CAS(empty, val); if (succ) { update_counter(tail, pos); return true ; } } return false ; Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 15 / 30
Circular Bounded Queue - Nonblocking Enqueue enqueue(T) size_t pos = tail; while (pos < head) { atomic<T>& e = buf[idx(pos)].val; ++pos; value_type empty = emptyVal(pos); bool succ = e.CAS(empty, val); if (succ) { update_counter(tail, pos); return true ; } } return false ; Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 15 / 30
Circular Bounded Queue - Nonblocking Dequeue Problem With unique empty values dequeue needs to update two locations ( head and buf[tail] ) Our Solution Use a descriptor to describe the work Other threads help interrupted threads Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 16 / 30
Circular Bounded Queue - Nonblocking Dequeue Problem With unique empty values dequeue needs to update two locations ( head and buf[tail] ) Our Solution Use a descriptor to describe the work Other threads help interrupted threads Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 16 / 30
Circular Bounded Queue - Valid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 17 / 30
Circular Bounded Queue - Valid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 17 / 30
Circular Bounded Queue - Valid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 17 / 30
Circular Bounded Queue - Valid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 17 / 30
Circular Bounded Queue - Valid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 17 / 30
Circular Bounded Queue - Invalid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 18 / 30
Circular Bounded Queue - Invalid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 18 / 30
Circular Bounded Queue - Invalid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 18 / 30
Circular Bounded Queue - Invalid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 18 / 30
Circular Bounded Queue - Invalid Dequeue Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 18 / 30
Circular Bounded Queue - Helping Problem with Helping The original thread and the helping thread have the same codepath (bottleneck). Our Solution Delay helping and try to dequeue from later position. Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 19 / 30
Circular Bounded Queue - Helping Problem with Helping The original thread and the helping thread have the same codepath (bottleneck). Our Solution Delay helping and try to dequeue from later position. Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 19 / 30
Circular Bounded Queue - Helping Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 20 / 30
Circular Bounded Queue - Helping Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 20 / 30
Circular Bounded Queue - Helping Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 20 / 30
Circular Bounded Queue - Helping Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 20 / 30
Implementation Overview Implemented for the C++ relaxed memory model Model Checked with CDSChecker All two thread cases with two operations each Some three thread cases with two operations each Some four thread cases with one operations each Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 21 / 30
Evaluation Three architecture families Snapdragon 410 (ARM) IBM Power8 Intel x86 40M operations Buffer size is 1024 elements Buffer is half-full at the beginning Each thread alternates enq and deq Each thread executes 40M / |Threads| operations Pirkelbauer et al. (UAB) ICA3PP December 14, 2016 22 / 30
Recommend
More recommend