fast less complicated lock free data structures ulrich
play

Fast, less-complicated, lock-free Data Structures Ulrich Drepper - PowerPoint PPT Presentation

Fast, less-complicated, lock-free Data Structures Ulrich Drepper ulrich.drepper@gs.com Accelerate Code Not (much) through new hardware Split into independent pieces Splitting comes at a cost Marshaling between stages


  1. Fast, less-complicated, lock-free Data Structures Ulrich Drepper ulrich.drepper@gs.com

  2. Accelerate Code ● Not (much) through new hardware ● Split into independent pieces ● Splitting comes at a cost ● Marshaling between stages ● Increased latency for pipeline ● Realistically: Parallelization needed! 2

  3. Parallelization ● Alternatives Extended “Amdahl's Law” 1 ● Multi-process S = ( 1 − P ) + P N ( 1 + O P ) or 2.5 ● Multi-thread 2 ● Error prone 1.5 ● High level of 1 parallelization needed 0.5 ● Keep cost of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 parallelization ( O p ) low P = 0.6 3

  4. Parallelization ● Collaboration through shared memory ● Synchronized access ● Synchronized access to data structures ● Atomic data structures (mostly based on Compare-And-Swap) bool __sync_bool_compare_and_swap(TYPE *ptr, TYPE oldval, TYPE newval) { if (*ptr != oldval) return false; *ptr = newval; return true; } 4

  5. Lock-Free Data Structures Single Double LIFO FIFO Hash Linked Linked 1:1 CAS CAS 1:N CAS No Priority N:1 CAS CAS M:N CAS 1:1 CAS CAS 1:N Priority N:1 CAS CAS M:N 5

  6. x86 Special Single Double LIFO FIFO Hash Linked Linked 1:1 CAS CAS 1:N CAS DWCAS No Priority N:1 CAS CAS M:N CAS DWCAS 1:1 CAS CAS Double-wide CAS 1:N Priority N:1 CAS CAS M:N 6

  7. Extended CAS ● Wider, more complicated CAS not the answer DCAS is not a Silver Bullet for Nonblocking Algorithm Design Doherty, Detlefs, Groves, Flood, Luchangco, Martin, Moir, Shavit, Steele, SPAA '04, 2004 7

  8. Locking ● Bane of Programming ● Interface design: explicit or implicit locking? ● Often unnecessary overhead ● Composability problem ● AB-BA locking problem void move(dbllist<T> &target, dbllist<T>::it &prev, dbllist<T> &source, dbllist<T>::it &elem); How to implement internal locking? 8

  9. Locking and Latency ● Yes, there are spinlocks Detect Lock Wakeup Collision ● Fairer/more power efficient Signal locking requires sleep Enter Delay ● Sleep requires wakeup Kernel Latency Exit Wake Kernel Resume Lock Operation 9

  10. Way Forward Two complimentary approaches ● Improve implementation of locking to ● Reduce contention ● Reduce cost of the operation ● Replace concept of locking 10

  11. Way Forward Two complimentary approaches ● Improve implementation of locking to ● Reduce contention ● Reduce cost of the operation Hardware Lock Elision (HLE) ● Replace concept of locking Transactional Memory (TM) 11

  12. Increase Parallelism ● Reduce lock contention ● Avoid “optimizations” like 4 reader-writer locks 3.5 3 ● Enable more code to be 2.5 parallelized 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P = 0.6 P = 0.8 12

  13. Running Example 13

  14. Locking Hash Tables ● Designed for concurrent accesses Thread 1 ● In practice mostly read accesses Separate Memory ● Even write accesses likely Locations will not conflict Thread 2 ● Locking is overkill 14

  15. Hash Table With locking 15

  16. Mutually Exclusive Access CAS(mutex, 0, 1) Mutex Yes == 0 Set 1? No Yes Read Delay Table Entry Update Table Wake Entry Store 0 in Mutex 16

  17. Mutually Exclusive Access Mutex Yes == 0 Set 1? No Yes Read Delay Table Hash Entry Mutex Tab Memory Memory Update Table Wake Entry Store 0 in Mutex 17

  18. Mutually Exclusive Access Mutex Yes == 0 Set 1? No No Yes Read Delay Table Entry Net Effect On Mutex: Nothing Update Table Wake Entry Store 0 in Mutex 18

  19. Hardware Lock Elision 19

  20. With Lock Elision What if '1' is Mutex Yes not written? == 0 Set 1? No Yes Read Delay Table Entry Update Table Wake Entry Store 0 in Mutex 20

  21. With Lock Elision Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Delay Table Entry Update Table Wake Entry Store 0 in Mutex 21

  22. With Lock Elision Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Delay Table No Mutual Entry Exclusion! Update Table Wake Entry Store 0 in Mutex 22

  23. No Mutual Exclusion ● Bad Mutex Yes ● But only if == 0 Set 1? Thread 1 ● Concurrent access to No Read Yes Thread 2 same memory location Delay Table Entry ● At least one of the accesses is write Update Table Wake Entry Store 0 in Mutex 23

  24. Alternative Mutex Yes == 0 Set 1? Thread 1 No Read Yes Thread 2 Detect Collisions! Delay Table Entry Update Table Wake Entry Store 0 in Mutex 24

  25. Intel HLE 25

  26. x86 code for Hash Table Thread 1 L1 Data Cache lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx mov $0, mut Hash Table call wake Thread 2 lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 mov $0, mut Main Memory call wake 26

  27. New in Intel HLE Thread 1 Transaction xacquire lock Flag cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock Lock Cache cmpxchg %ebx, mut jne 2f 0 Mutex New Instruction mov $4, table+5 Prefixes xrelease mov $0, mut (compatible) call wake 27

  28. Successful Concurrent Use 28

  29. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 29

  30. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 30

  31. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 xrelease mov $0, mut call wake 31

  32. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 32

  33. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 33

  34. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 ✓ xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 xrelease mov $0, mut Old: 0 call wake 34

  35. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut 42 jne 2f 42 mov table+2, %edx 1 0 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 T 1 ✓ xrelease mov $0, mut Old: 0 call wake 35

  36. No Collision Thread 1 xacquire lock cmpxchg %ebx, mut 42 jne 2f 42 mov table+2, %edx 0 xrelease mov $0, mut Hash Table call wake 4 Thread 2 xacquire lock 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+5 1 0 xrelease mov $0, mut Old: 0 call wake 36

  37. Unsuccessful Concurrent Use 37

  38. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx xrelease mov $0, mut Hash Table call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 38

  39. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 39

  40. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 xrelease mov $0, mut call wake 40

  41. With Collision Thread 1 xacquire lock cmpxchg %ebx, mut T 42 jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 41

  42. With Collision Thread 1  xacquire lock cmpxchg %ebx, mut T 42  jne 2f 42 mov table+2, %edx T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 42

  43. With Collision Thread 1  xacquire lock cmpxchg %ebx, mut T 42  jne 2f 42 mov table+2, %edx  T 1 xrelease mov $0, mut Hash Table Old: 0 call wake Thread 2 xacquire lock T 4 cmpxchg %ebx, mut jne 2f 0 Mutex mov $4, table+2 T 1 xrelease mov $0, mut Old: 0 call wake 43

Recommend


More recommend