succinct trie indexes made practical
play

Succinct Trie Indexes Made Practical Huanchen Zhang David G. - PowerPoint PPT Presentation

Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton DRAM price wont fall forever Price Year Memory-efficient data structures are helpful Smaller data structures More


  1. Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton

  2. DRAM price won’t fall forever Price Year

  3. Memory-efficient data structures are helpful Smaller data structures More data resident in faster memory Better performance + lower costs

  4. The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n

  5. The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = (

  6. The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = ( 256 9.44n

  7. The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = ( 256 9.44n FST = 10n

  8. Succinct Data Structures Use # of bits close to ITLB Suppose ITLB = L bits Implicit: L + O(1) Succinct: L + o(L) Compact: O(L) FST

  9. Why aren’t succinct data structures popular? Read-only Log-structured design Slow Complex

  10. Existing succinct tries are slow 50M 64-bit integer keys Memory Lookup Latency including key suffixes 3 1.5 2 1 GB us 1 0.5 0 0 ART tx-trie PDT ART tx-trie PDT

  11. Fast Succinct Trie (FST) is fast and small 50M 64-bit integer keys Memory Lookup Latency including key suffixes 3 1.5 2 1 GB us 1 0.5 0 0 ART tx-trie PDT FST ART tx-trie PDT FST

  12. Encoding Mechanism

  13. 3 ways to succinctly encode ordinal trees Ordinal tree: a rooted tree where each node can have an arbitrary # of children in order 0 1 2 3 4 5 6 7 8 9 A B C D E

  14. 3 ways to succinctly encode ordinal trees ! $" ≈ 2' bits |n-node ordinal tree| = C n = " "#! 0 1 2 3 4 5 6 7 8 9 A B C D E

  15. 3 ways to succinctly encode ordinal trees LOUDS: level-ordered unary degree sequence 0 110 1 2 10 110 3 4 5 1110 110 110 6 7 8 9 A B C 0 10 0 0 0 10 0 D E 0 0

  16. 3 ways to succinctly encode ordinal trees LOUDS: 110 10 110 1110 110 110 0 10 0 0 0 10 0 0 0 0 110 1 2 10 110 3 4 5 1110 110 110 6 7 8 9 A B C 0 10 0 0 0 10 0 D E 0 0

  17. 3 ways to succinctly encode ordinal trees BP: balanced parenthesis 0 1 2 3 4 5 6 7 8 9 A B C D E

  18. 3 ways to succinctly encode ordinal trees BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) ) 0 1 2 3 4 5 6 7 8 9 A B C D E

  19. 3 ways to succinctly encode ordinal trees 3 0 2 BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) ) 8 6 D 9 A E C 7 B 4 0 1 5 1 2 3 4 5 6 7 8 9 A B C D E

  20. 3 ways to succinctly encode ordinal trees DFUDS: depth-first unary degree sequence 0 1 2 3 4 5 6 7 8 9 A B C D E

  21. 3 ways to succinctly encode ordinal trees DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) ) 0 1 2 3 4 5 6 7 8 9 A B C D E

  22. 3 ways to succinctly encode ordinal trees DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) ) 0 1 3 6 7 D 8 2 4 9 A 5 B E C 0 1 2 3 4 5 6 7 8 9 A B C D E

  23. LOUDS-Sparse: succinctly encode tries L: f s t $ a o r r s t y p i y $ t e p f t HC: 1010 1 110 100 0 10 000 0 s S: 1001 0 101001 0 10 101 0 $ r o v 1 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: a r t y p y v 2 i s $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11

  24. LOUDS-Sparse: succinctly encode tries L: f s t $ a o r r s t y p i y $ t e p f t HC: 1010 1 110 100 0 10 000 0 s S: 1001 0 101001 0 10 101 0 $ r o v 1 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: a Why LOUDS? r t y p y v 2 i s 1. Fast tree nav. 2. Good label locality $ t e p v 3 v 4 v 5 v 6 v 7 3. Easy implementation v 8 v 9 v 10 v 11

  25. Rank & select on bit-vectors 0 5 10 15 bv: 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 rank(bv, i) = # of 1’s in bv up to position i select(bv, i) = position of the ith 1 in bv Examples: rank(bv, 7) = 4 select(bv, 7) = 14

  26. Compute rank & select in constant time The classic algorithm for computing rank bv

  27. Compute rank & select in constant time The classic algorithm for computing rank !" # $ bits super block = … bv

  28. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block =

  29. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block cumulative rank

  30. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block cumulative rank rank in super block

  31. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries

  32. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & '( 2 *

  33. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & ! '( 2 * " '(*

  34. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & remaining ! bits '( 2 * " '(*

  35. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & remaining O (1) time ! bits '( 2 * " '(*

  36. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries + + remaining O (1) time ! bits () 2 * " ()* O ( % O ( % #$% ) #$% ()()*) space: o (*) O ( * ()* ()()*)

  37. Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries + + remaining O (1) time ! bits () 2 * " ()* O ( % O ( % #$% ) #$% ()()*) space: o (*) O ( * ()* ()()*) Select is similar but trickier, often based on rank structures

  38. Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11

  39. Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11

  40. Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11

  41. Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11

Recommend


More recommend