advanced database systems
play

ADVANCED DATABASE SYSTEMS OLTP Indexes (Trie Data Structures) @ - PowerPoint PPT Presentation

Lect ure # 07 ADVANCED DATABASE SYSTEMS OLTP Indexes (Trie Data Structures) @ Andy_Pavlo // 15- 721 // Spring 2020 2 Latches B+Trees Judy Array ART Masstree 15-721 (Spring 2020) 3 LATCH IM PLEM ENTATIO N GOALS Small memory


  1. 13 EXAM PLE # 1: SEARCH 23 We can release the latch on A as soon as we acquire the latch for C . A 20 R B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  2. 13 EXAM PLE # 1: SEARCH 23 We can release the latch on A as soon as we acquire the latch for C . A 20 R B C 10 35 R D E F G 6 12 23 38 44 15-721 (Spring 2020)

  3. 13 EXAM PLE # 1: SEARCH 23 We can release the latch on A as soon as we acquire the latch for C . A 20 B C 10 35 R D E F G 6 12 23 38 44 15-721 (Spring 2020)

  4. 14 EXAM PLE # 2: DELETE 4 4 W A 20 B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  5. 14 EXAM PLE # 2: DELETE 4 4 We may need to coalesce C , so we W can’t release the latch on A . A 20 W B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  6. 14 EXAM PLE # 2: DELETE 4 4 We may need to coalesce C , so we W can’t release the latch on A . A 20 G will not merge with F , so we can release latches on A and C . W B C 10 35 W D E F G 6 12 23 38 44 15-721 (Spring 2020)

  7. 14 EXAM PLE # 2: DELETE 4 4 We may need to coalesce C , so we can’t release the latch on A . A 20 G will not merge with F , so we can release latches on A and C . B C 10 35 W D E F G 6 12 23 38 44 15-721 (Spring 2020)

  8. 15 EXAM PLE # 3: INSERT 4 0 W A 20 B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  9. 15 EXAM PLE # 3: INSERT 4 0 C has room if its child has to split, so W we can release the latch on A . A 20 W B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  10. 15 EXAM PLE # 3: INSERT 4 0 C has room if its child has to split, so we can release the latch on A . A 20 W B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  11. 15 EXAM PLE # 3: INSERT 4 0 C has room if its child has to split, so we can release the latch on A . A 20 G must split, so we can’t release the latch on C . W B C 10 35 W D E F G 6 12 23 38 44 15-721 (Spring 2020)

  12. 15 EXAM PLE # 3: INSERT 4 0 C has room if its child has to split, so we can release the latch on A . A 20 G must split, so we can’t release the latch on C . W B C 44 10 35 W D E F G H 40 6 12 23 38 44 44 15-721 (Spring 2020)

  13. 15 EXAM PLE # 3: INSERT 4 0 C has room if its child has to split, so we can release the latch on A . A 20 G must split, so we can’t release the latch on C . B C 44 10 35 D E F G H 40 6 12 23 38 44 44 15-721 (Spring 2020)

  14. 17 BETTER LATCH CRABBIN G The basic latch crabbing algorithm always takes a write latch on the root for any update. → This makes the index essentially single threaded. A better approach is to optimistically assume that the target leaf node is safe. → Take R latches as you traverse the tree to reach it and verify. → If leaf is not safe, then do previous algorithm. CONCURRENCY OF OPERATIONS ON B- TREES ACTA INFORMATICA 1977 15-721 (Spring 2020)

  15. 18 EXAM PLE # 4 : DELETE 4 4 R A 20 B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  16. 18 EXAM PLE # 4 : DELETE 4 4 We assume that C is safe, so we can release the latch on A . A 20 R B C 10 35 D E F G 6 12 23 38 44 15-721 (Spring 2020)

  17. 18 EXAM PLE # 4 : DELETE 4 4 We assume that C is safe, so we can release the latch on A . A 20 Acquire an exclusive latch on G . B C 10 35 W D E F G 6 12 23 38 44 15-721 (Spring 2020)

  18. 18 EXAM PLE # 4 : DELETE 4 4 We assume that C is safe, so we can release the latch on A . A 20 Acquire an exclusive latch on G . B C 10 35 W D E F G 6 12 23 38 44 15-721 (Spring 2020)

  19. 19 VERSION ED LATCH COUPLING Optimistic crabbing scheme where writers are not blocked on readers. Every node now has a version number (counter). → Writers increment counter when they acquire latch. → Readers proceed if a node’s latch is available but then do not acquire it. → It then checks whether the latch’s counter has changed from when it checked the latch. Relies on epoch GC to ensure pointers are valid. THE ART OF PRACTICAL SYNCHRONIZATION DAMON 2016 15-721 (Spring 2020)

  20. 20 VERSION ED LATCHES: SEARCH 4 4 A v3 20 B C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  21. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A v3 20 B C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  22. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A v3 20 @B B C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  23. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B B C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  24. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  25. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node B C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  26. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  27. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C B : Recheck v5 v4 10 v5 35 C : Examine Node D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  28. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C B : Recheck v5 v4 10 v5 35 C : Examine Node D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  29. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node B C @C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  30. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C v4 10 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  31. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C v4 10 v5 v6 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  32. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C B : Recheck v5 v4 10 v6 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  33. 20 VERSION ED LATCHES: SEARCH 4 4 A : Read v3 @A A : Examine Node A B : Read v5 v3 20 @B A : Recheck v3 B : Examine Node C : Read v9 B C @C B : Recheck v5 v4 10 v6 v5 35 D E F G v6 v9 v4 v5 6 12 23 38 44 15-721 (Spring 2020)

  34. 21 OBSERVATION The inner node keys in a B+tree cannot tell you whether a key exists in the index. You always must traverse to the leaf node. This means that you could have (at least) one cache miss per level in the tree. 15-721 (Spring 2020)

  35. 22 TRIE INDEX Keys: HELLO , HAT , HAVE Use a digital representation of keys H to examine prefixes one-by-one instead of comparing entire key. A E → Also known as Digital Search Tree , Prefix Tree . T V L ¤ E L ¤ O ¤ 15-721 (Spring 2020)

  36. 22 TRIE INDEX Keys: HELLO , HAT , HAVE Use a digital representation of keys H to examine prefixes one-by-one instead of comparing entire key. A E → Also known as Digital Search Tree , Prefix Tree . T V L ¤ E L ¤ O ¤ 15-721 (Spring 2020)

  37. 23 TRIE INDEX PROPERTIES Shape only depends on key space and lengths. → Does not depend on existing keys or insertion order. → Does not require rebalancing operations. All operations have O( k ) complexity where k is the length of the key. → The path to a leaf node represents the key of the leaf → Keys are stored implicitly and can be reconstructed from paths. 15-721 (Spring 2020)

  38. 24 TRIE KEY SPAN The span of a trie level is the number of bits that each partial key / digit represents. → If the digit exists in the corpus, then store a pointer to the next level in the trie branch. Otherwise, store null. This determines the fan-out of each node and the physical height of the tree. → n -way Trie = Fan-Out of n 15-721 (Spring 2020)

  39. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  40. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  41. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  42. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  43. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  44. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  45. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 0 ¤ 1 Ø 0 ¤ 1 Ø ←Repeat 10x K10→ 00000000 00001010 0 ¤ 1 ¤ K25→ 00000000 00011001 0 Ø 1 ¤ 0 Ø 1 ¤ K31→ 00000000 00011111 0 ¤ 1 Ø 0 ¤ 1 ¤ 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 ¤ 1 Ø 0 Ø 1 ¤ 0 Ø 1 ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  46. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 ¤ Ø ¤ Ø ←Repeat 10x K10→ 00000000 00001010 ¤ ¤ K25→ 00000000 00011001 Ø ¤ Ø ¤ K31→ 00000000 00011111 ¤ Ø ¤ ¤ Ø ¤ ¤ Ø Ø ¤ ¤ Ø Ø ¤ Ø ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  47. 25 TRIE KEY SPAN 1-bit Span Trie Keys: K10,K25,K31 ¤ Ø ¤ Ø ←Repeat 10x K10→ 00000000 00001010 ¤ ¤ K25→ 00000000 00011001 Ø ¤ Ø ¤ K31→ 00000000 00011111 ¤ Ø ¤ ¤ Ø ¤ ¤ Ø Ø ¤ ¤ Ø Ø ¤ Ø ¤ Tuple Node Pointer Pointer 15-721 (Spring 2020)

  48. 26 RADIX TREE 1-bit Span Radix Tree Omit all nodes with only a single ¤ Ø child. Repeat 10x ¤ Ø → Also known as Patricia Tree . ¤ ¤ Can produce false positives, so the Ø ¤ DBMS always checks the original ¤ ¤ tuple to see whether a key matches. Tuple Node Pointer Pointer 15-721 (Spring 2020)

  49. 27 TRIE VARIANTS Judy Arrays (HP) ART Index (HyPer) Masstree (Silo) 15-721 (Spring 2020)

  50. 28 J UDY ARRAYS Variant of a 256-way radix tree. First known radix tree that supports adaptive node representation. Three array types → Judy1 : Bit array that maps integer keys to true/false. → JudyL : Map integer keys to integer values. → JudySL : Map variable-length keys to integer values. Open-Source Implementation (LGPL). Patented by HP in 2000. Expires in 2022. → Not an issue according to authors. → http://judy.sourceforge.net/ 15-721 (Spring 2020)

  51. 29 J UDY ARRAYS Do not store meta-data about node in its header. → This could lead to additional cache misses. Pack meta-data about a node in 128-bit "Judy Pointers" stored in its parent node. → Node Type → Population Count → Child Key Prefix / Value (if only one child below) → 64-bit Child Pointer A COMPARISON OF ADAPTIVE RADIX TREES AND HASH TABLES ICDE 2015 15-721 (Spring 2020)

  52. 30 J UDY ARRAYS: NODE TYPES Every node can store up to 256 digits. Not all nodes will be 100% full though. Adapt node's organization based on its keys. → Linear Node: Sparse Populations → Bitmap Node: Typical Populations → Uncompressed Node: Dense Population A COMPARISON OF ADAPTIVE RADIX TREES AND HASH TABLES ICDE 2015 15-721 (Spring 2020)

  53. 31 J UDY ARRAYS: LINEAR NODES Linear Node Store sorted list of partial prefixes 0 1 5 0 1 5 up to two cache lines. ... ... ¤ ¤ ¤ K0 K2 K8 → Original spec was one cache line Store separate array of pointers to children ordered according to prefix sorted. 15-721 (Spring 2020)

  54. 31 J UDY ARRAYS: LINEAR NODES Linear Node Store sorted list of partial prefixes 0 1 5 0 1 5 up to two cache lines. ... ... ¤ ¤ ¤ K0 K2 K8 → Original spec was one cache line Sorted Digits Store separate array of pointers to children ordered according to prefix sorted. 15-721 (Spring 2020)

  55. 31 J UDY ARRAYS: LINEAR NODES Linear Node Store sorted list of partial prefixes 0 1 5 0 1 5 up to two cache lines. ... ... ¤ ¤ ¤ K0 K2 K8 → Original spec was one cache line Sorted Digits Child Pointers Store separate array of pointers to children ordered according to prefix sorted. 15-721 (Spring 2020)

  56. 31 J UDY ARRAYS: LINEAR NODES Linear Node Store sorted list of partial prefixes 0 1 5 0 1 5 up to two cache lines. ... ... ¤ ¤ ¤ K0 K2 K8 → Original spec was one cache line Sorted Digits Child Pointers Store separate array of pointers to 6 × 1-byte = 6 × 16-bytes = children ordered according to 6 bytes 96 bytes prefix sorted. 102 bytes 128 bytes (padded) 15-721 (Spring 2020)

  57. 32 J UDY ARRAYS: BITM AP NODES Bitmap Node 256-bit map to mark whether a prefix is present in node. 0-7 8-15 248-255 01000110 ¤ 00000000 ¤ ... 00100100 ¤ Bitmap is divided into eight segments, each with a pointer to a ... ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ sub-array with pointers to child nodes. 15-721 (Spring 2020)

  58. 32 J UDY ARRAYS: BITM AP NODES Bitmap Node 256-bit map to mark whether a Prefix Bitmaps prefix is present in node. 0-7 8-15 248-255 01000110 ¤ 00000000 ¤ ... 00100100 ¤ Bitmap is divided into eight segments, each with a pointer to a ... ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ sub-array with pointers to child nodes. Digit 0→ 00000000 4→ 00000100 Offset 1→ 00000001 5→ 00000101 2→ 00000010 6→ 00000110 3 → 00000011 7→ 00000111 15-721 (Spring 2020)

  59. 32 J UDY ARRAYS: BITM AP NODES Bitmap Node 256-bit map to mark whether a Prefix Bitmaps Sub-Array Pointers prefix is present in node. 0-7 8-15 248-255 01000110 ¤ 00000000 ¤ ... 00100100 ¤ Bitmap is divided into eight segments, each with a pointer to a ... ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ sub-array with pointers to child nodes. 15-721 (Spring 2020)

  60. 32 J UDY ARRAYS: BITM AP NODES Bitmap Node 256-bit map to mark whether a Prefix Bitmaps Sub-Array Pointers prefix is present in node. 0-7 8-15 248-255 01000110 ¤ 00000000 ¤ ... 00100100 ¤ Bitmap is divided into eight segments, each with a pointer to a ... ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ sub-array with pointers to child Child Pointers nodes. 15-721 (Spring 2020)

  61. 33 ADAPATIVE RADIX TREE (ART) Developed for TUM HyPer DBMS in 2013. 256-way radix tree that supports different node types based on its population. → Stores meta-data about each node in its header. Concurrency support was added in 2015. THE ADAPTIVE RADIX TREE: ARTFUL INDEXING FOR MAIN- MEMORY DATABASES ICDE 2013 15-721 (Spring 2020)

  62. 34 ART vs. J UDY Difference #1: Node Types → Judy has three node types with different organizations. → ART has four nodes types that (mostly) vary in the maximum number of children. Difference #2: Purpose → Judy is a general-purpose associative array. It "owns" the keys and values. → ART is a table index and does not need to cover the full keys. Values are pointers to tuples. 15-721 (Spring 2020)

  63. 35 ART: INNER NODE TYPES (1) Node4 Store only the 8-bit digits that exist 0 1 2 3 0 1 2 3 at a given node in a sorted array. ¤ ¤ ¤ ¤ K0 K2 K3 K8 The offset in sorted digit array corresponds to offset in value array. Node16 0 1 15 0 1 15 ... ... ¤ ¤ ¤ K0 K2 K8 15-721 (Spring 2020)

  64. 35 ART: INNER NODE TYPES (1) Node4 Store only the 8-bit digits that exist 0 1 2 3 0 1 2 3 at a given node in a sorted array. ¤ ¤ ¤ ¤ K0 K2 K3 K8 Sorted Digits The offset in sorted digit array corresponds to offset in value array. Node16 0 1 15 0 1 15 ... ... ¤ ¤ ¤ K0 K2 K8 15-721 (Spring 2020)

Recommend


More recommend