Fast, less-complicated, lock-free Data Structures Ulrich Drepper ulrich.drepper@gs.com
Accelerate Code ● Not (much) through new hardware ● Split into independent pieces ● Splitting comes at a cost ● Marshaling between stages ● Increased latency for pipeline ● Realistically: Parallelization needed! 2
Parallelization ● Alternatives Extended “Amdahl's Law” 1 ● Multi-process S = ( 1 − P ) + P N ( 1 + O P ) or 2.5 ● Multi-thread 2 ● Error prone 1.5 ● High level of 1 parallelization needed 0.5 ● Keep cost of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 parallelization ( O p ) low P = 0.6 3
Parallelization ● Collaboration through shared memory ● Synchronized access ● Synchronized access to data structures ● Atomic data structures (mostly based on Compare-And-Swap) bool __sync_bool_compare_and_swap(TYPE *ptr, TYPE oldval, TYPE newval) { if (*ptr != oldval) return false; *ptr = newval; return true; } 4
Lock-Free Data Structures Single Double LIFO FIFO Hash Linked Linked 1:1 CAS CAS 1:N CAS No Priority N:1 CAS CAS M:N CAS 1:1 CAS CAS 1:N Priority N:1 CAS CAS M:N 5
x86 Special Single Double LIFO FIFO Hash Linked Linked 1:1 CAS CAS 1:N CAS DWCAS No Priority N:1 CAS CAS M:N CAS DWCAS 1:1 CAS CAS Double-wide CAS 1:N Priority N:1 CAS CAS M:N 6
Extended CAS ● Wider, more complicated CAS not the answer DCAS is not a Silver Bullet for Nonblocking Algorithm Design Doherty, Detlefs, Groves, Flood, Luchangco, Martin, Moir, Shavit, Steele, SPAA '04, 2004 7
Locking ● Bane of Programming ● Interface design: explicit or implicit locking? ● Often unnecessary overhead ● Composability problem ● AB-BA locking problem void move(dbllist<T> &target, dbllist<T>::it &prev, dbllist<T> &source, dbllist<T>::it &elem); How to implement internal locking? 8
Locking and Latency ● Yes, there are spinlocks Detect Lock Wakeup Collision ● Fairer/more power efficient Signal locking requires sleep Enter Delay ● Sleep requires wakeup Kernel Latency Exit Wake Kernel Resume Lock Operation 9
Way Forward Two complimentary approaches ● Improve implementation of locking to ● Reduce contention ● Reduce cost of the operation ● Replace concept of locking 10
Way Forward Two complimentary approaches ● Improve implementation of locking to ● Reduce contention ● Reduce cost of the operation Hardware Lock Elision (HLE) ● Replace concept of locking Transactional Memory (TM) 11
Increase Parallelism ● Reduce lock contention ● Avoid “optimizations” like 4 reader-writer locks 3.5 3 ● Enable more code to be 2.5 parallelized 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P = 0.6 P = 0.8 12
Running Example 13
Locking Hash Tables ● Designed for concurrent accesses Thread 1 ● In practice mostly read accesses Separate Memory ● Even write accesses likely Locations will not conflict Thread 2 ● Locking is overkill 14
Hash Table With locking 15
Mutually Exclusive Access CAS(mutex, 0, 1) Mutex Yes == 0 Set 1? No Yes Read Delay Table Entry Update Table Wake Entry Store 0 in Mutex 16
Mutually Exclusive Access Mutex Yes == 0 Set 1? No Yes Read Delay Table Hash Entry Mutex Tab Memory Memory Update Table Wake Entry Store 0 in Mutex 17
Mutually Exclusive Access Mutex Yes == 0 Set 1? No No Yes Read Delay Table Entry Net Effect On Mutex: Nothing Update Table Wake Entry Store 0 in Mutex 18
Hardware Lock Elision 19
With Lock Elision What if '1' is Mutex Yes not written? == 0 Set 1? No Yes Read Delay Table Entry Update Table Wake Entry Store 0 in Mutex 20
With Lock Elision Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Delay Table Entry Update Table Wake Entry Store 0 in Mutex 21
With Lock Elision Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Delay Table No Mutual Entry Exclusion! Update Table Wake Entry Store 0 in Mutex 22
No Mutual Exclusion ● Bad Mutex Yes ● But only if == 0 Set 1? Thread 1 ● Concurrent access to No Read Yes Thread 2 same memory location Delay Table Entry ● At least one of the accesses is write Update Table Wake Entry Store 0 in Mutex 23
Alternative Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Detect Collisions! Delay Table Entry Update Table Wake Entry Store 0 in Mutex 24
Intel HLE 25
x86 code for Hash Table Thread 1 L1 Data Cache lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx mov $0, mut Hash Table call wake Thread 2 lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 mov $0, mut Main Memory call wake 26
New in Intel HLE Thread 1 Transaction xacquire lock Flag cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock Lock Cache cmpxchg %ebx, mut jne 2f 0 Mutex New Instruction mov $4, table+5 Prefixes xrelease mov $0, mut (compatible) call wake 27
Successful Concurrent Use 28
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 29
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 30
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 31
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 32
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 33
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 ✓ xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 34
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut 42 jne 2f 42 mov table+2, %edx 1 0 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 ✓ xrelease mov $0, mut Old: 0 call wake 35
No Collision Thread 1 xacquire lock cmpxchg %ebx, mut 42 jne 2f 42 mov table+2, %edx 0 xrelease mov $0, mut Hash Table call wake 4 Thread 2 xacquire lock 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 1 0 xrelease mov $0, mut Old: 0 call wake 36
Unsuccessful Concurrent Use 37
With Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 38
With Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 39
With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 40
With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 41
With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 42
With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 43
Recommend
More recommend