unist unist hanyang univ unist skku
play

UNIST UNIST Hanyang Univ. UNIST/SKKU Fast but Asymmetric - PowerPoint PPT Presentation

Deukyeon Hwang Wook-Hee Kim Youjip Won Beomseok Nam UNIST UNIST Hanyang Univ. UNIST/SKKU Fast but Asymmetric Non-Volatility Byte-Addressability Large Capacity Access Latency CPU Caches Persistent Memory (Non-Volatile) (Volatile)


  1. Deukyeon Hwang Wook-Hee Kim Youjip Won Beomseok Nam UNIST UNIST Hanyang Univ. UNIST/SKKU

  2. Fast but Asymmetric Non-Volatility Byte-Addressability Large Capacity Access Latency

  3. CPU Caches Persistent Memory (Non-Volatile) (Volatile) LOST 40! 30 30 40 10 20 30 40 30 40 cache line FLUSH

  4. Inserting 25 into a node 10 20 30 40 (0 ) Partially updated tree node is inconsistent 10 20 30 40 40 (1 ) → 10 20 30 30 40 Append-Only Update (2 ) 10 20 25 30 40 (3 )

  5. Node Split Node A Node A Node B 10 20 30 10 20 30 40 40 60 60 ʌ ʌ ʌ P1 P2 P3 P1 P2 P3 P4 P6 P4 P6 Logging → Selective Persistence (Internal node in DRAM)

  6. ▪ Append-Only • Unsorted keys ▪ Selective Persistence • Internal node → DRAM • Internal nodes have to be reconstructed from leaf nodes after failures • Logging for leaf nodes ▪ Previous solutions NV- Tree [FAST’15] Append-Only leaf update + Selective Persistence wB+- Tree [VLDB’15] Append-Only node update + bitmap/slot array metadata FP- Tree [SIGMOD’16] Append-Only leaf update + fingerprints + Selective Persistence

  7. F ailure- A tomic S hif T Append-Only (FAST) (Unsorted keys) Lock-Free Search F ailure- A tomic Selective Persistence I n-place R ebalancing (DRAM + PM) (FAIR)

  8. ▪ Modern processors reorder instructions to utilize the memory bandwidth ▪ Memory ordering in x86 and ARM x86 ARM stores-after-stores Y N stores-after-loads N N loads-after-stores N N loads-after-loads N N Inst. w/ dependency Y Y ▪ x86 guarantees Total Store Ordering (TSO) ▪ Dependent instructions are not reordered

  9. ▪ Pointers in B+-Tree store unique memory addresses ▪ 8-byte pointer can be atomically updated Read transactions detect transient inconsistency between duplicate pointers ▪ transient inconsistency • In-flight state partially updated by a write transaction 10 20 30 40 40 P1 P2 P3 P4 P5 P5

  10. 10 20 30 40 P1 P2 P3 P4 P5 P5 mfence(); mfence(); TSO 10 20 30 40 40 P1 P2 P3 P4 P5 P5

  11. Insert (25, P6) into a node using FAST g: Garbage 10 20 30 40 g g ʌ : Null ʌ ʌ P1 P2 P3 P4 P5 Read transactions can succeed in finding a key even if a system crashes in any step

  12. Insert (25, P6) into a node using FAST 10 20 30 40 g g ʌ P1 P2 P3 P4 P5 P5

  13. Insert (25, P6) into a node using FAST 10 20 30 40 40 g ʌ P1 P2 P3 P4 P5 P5

  14. Insert (25, P6) into a node using FAST 10 20 30 40 40 g ʌ P1 P2 P3 P4 P5 P5

  15. Insert (25, P6) into a node using FAST read transaction 10 20 30 40 40 g ʌ P1 P2 P3 P4 P5 P5 Key 40 between duplicate pointers is ignored!

  16. Insert (25, P6) into a node using FAST 10 20 30 40 40 g ʌ P1 P2 P3 P4 P4 P5 Shifting P4 invalidates the left 40

  17. Insert (25, P6) into a node using FAST 10 20 30 30 40 g ʌ P1 P2 P3 P4 P4 P5

  18. Insert (25, P6) into a node using FAST 10 20 30 30 40 g ʌ P1 P2 P3 P3 P4 P5

  19. Insert (25, P6) into a node using FAST 10 20 25 30 40 g ʌ P1 P2 P3 P3 P4 P5

  20. Insert (25, P6) into a node using FAST 10 20 25 30 40 g ʌ P1 P2 P3 P6 P4 P5 Storing P6 validates 25

  21. ▪ It is necessary to call clflush at the boundary of cache line 10 20 30 40 g g ʌ ʌ P1 P2 P3 P4 P5 Cache Line Cache Line 2 1 10 20 30 30 40 g ʌ P1 P2 P3 P3 P4 P5 mfence() clflush( ) Cache Line 2 mfence() Cache Line Cache Line 1 2

  22. ▪ Let’s avoid expensive logging by making read transactions be aware of rebalancing operations ▪ B link -Tree 10 20 30 40 70 80 90

  23. FAIR split a node Node A Node B 10 20 30 40 40 60 60 ʌ ʌ P1 P2 P3 P4 P6 P4 P6 A read transaction can detect transient inconsistency if keys are out of order

  24. FAIR split a node Node A Node B 10 20 30 40 60 ʌ ʌ P1 P2 P3 P4 P6 Setting NULL pointer validates Node B. Node A and Node B are virtually a single node

  25. FAIR split a node Node A Node B 10 20 30 40 60 ʌ ʌ P1 P2 P3 P4 P6 Migrated keys can be accessed via sibling pointer

  26. FAIR split a node Node A Node B 10 20 30 40 50 60 ʌ ʌ P1 P2 P3 P4 P6 P5

  27. Insert a key into the parent node using FAST after FAIR split Node R root 10 70 70 C2 C3 C3 Node A Node B Node C 10 20 30 40 50 60 70 80 90

  28. Insert a key into the parent node using FAST after FAIR split Node R root 10 70 70 C3 C2 C2 Node A Node B Node C 10 20 30 40 50 60 70 80 90 Node B can be accessed from Node A

  29. Insert a key into the parent node using FAST after FAIR split ➢ Searching the key 50 from the root after a system crash Node R root 10 70 70 key accessed by read transaction C3 C2 C2 Node A Node B Node C 10 20 30 40 50 60 70 80 90 Node B can be accessed from Node A

  30. Insert a key into the parent node using FAST after FAIR split Node R root 10 40 70 C3 C2 C4 Node A Node B Node C 10 20 30 40 50 60 70 80 90 FAST inserting makes Node B visible atomically

  31. Read transactions can tolerate any inconsistency caused by write transactions → Read transactions can access the transient inconsistent tree node being modified by a write transaction → Lock-Free Search

  32. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 30 40 g g ʌ ʌ P1 P2 P3 P4 P5 shift →

  33. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 30 40 g g ʌ P1 P2 P3 P4 P5 P5 shift →

  34. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 30 40 40 g ʌ P1 P2 P3 P4 P5 P5 shift →

  35. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 30 40 40 g ʌ P1 P2 P3 P4 P4 P5 shift →

  36. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 30 30 40 g ʌ P1 P2 P3 P4 P4 P5 shift →

  37. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 30 30 40 g ʌ P1 P2 P3 P3 P4 P5 shift →

  38. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 20 30 40 g ʌ P1 P2 P3 P3 P4 P5 shift →

  39. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction read → 10 20 20 30 40 g ʌ P1 P2 P2 P3 P4 P5 shift →

  40. Read transaction [Example 1] Searching 30 while inserting (15, P6) Write transaction FOUND! read → 10 20 20 30 40 g ʌ P1 P2 P2 P3 P4 P5 shift →

  41. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction read → 10 20 30 40 g g ʌ ʌ P1 P2 P3 P4 P5  shift

  42. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction read → 10 20 30 40 g g ʌ ʌ P1 P3 P3 P4 P5  shift

  43. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction read → 10 30 30 40 g g ʌ ʌ P1 P3 P3 P4 P5  shift

  44. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction read → 10 30 30 40 g g ʌ ʌ P1 P3 P4 P4 P5  shift

  45. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction read → 10 30 40 40 g g ʌ ʌ P1 P3 P4 P4 P5  shift

  46. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction read → 10 30 40 40 g g ʌ ʌ P1 P3 P4 P5 P5  shift

  47. Read transaction [Example 2] Searching 30 while deleting (20, P2) Write transaction 30 NOT FOUND read → 10 30 40 40 g g ʌ ʌ P1 P3 P4 P5 P5  shift The read transaction cannot find the key 30 due to shift operation

  48. ▪ Direction flag: • Odd Number • Even Number – Deletion shifts to the left. – Insertion shifts to the right. – Search must scan from Right to Left – Search must scan from Left to Right read → Search 40 10 20 30 40 g g counter 2 ʌ ʌ P1 P2 P3 P4 P5 Insert 25 shift →

  49. ▪ Direction flag: • Odd Number • Even Number – Deletion shifts to the left. – Insertion shifts to the right. – Search must scan from Right to Left – Search must scan from Left to Right  read Search 40 10 20 30 40 g g counter 3 ʌ ʌ P1 P2 P3 P4 P5 Delete 25  shift

  50. ▪ Direction flag: • Odd Number • Even Number – Deletion shifts to the left. – Insertion shifts to the right. – Search must scan from Right to Left – Search must scan from Left to Right read → Search 40 10 20 30 40 g g counter 3 2 ʌ ʌ P1 P2 P3 P4 P5 Delete 25  shift The read transaction has to check the counter once again to make sure the counter has not changed. Otherwise, search the node again.

  51. Transaction A Transaction B BEGIN INSERT 10 SUSPENDED BEGIN SEARCH 10(FOUND) COMMIT WAKE UP ABORT Dirty reads problem The ordering of Transaction A and Transaction B cannot be determined

  52. Isolation Level Highest Serializable Repeatable reads Read committed Read uncommitted Lowest Our Lock-Free Search supports low isolation level

Recommend


More recommend