Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton
DRAM price won’t fall forever Price Year
Memory-efficient data structures are helpful Smaller data structures More data resident in faster memory Better performance + lower costs
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = (
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = ( 256 9.44n
The limit: information-theoretic lower bound (ITLB) The minimum # of bits required to distinguish any object in a class !"# 2 % bits |S| = n % !"# 2 + − (+ − 1)!"# 2 (+ − 1) bits |n-node trie of degree k| '()* ⁄ +% + 1 = ( 256 9.44n FST = 10n
Succinct Data Structures Use # of bits close to ITLB Suppose ITLB = L bits Implicit: L + O(1) Succinct: L + o(L) Compact: O(L) FST
Why aren’t succinct data structures popular? Read-only Log-structured design Slow Complex
Existing succinct tries are slow 50M 64-bit integer keys Memory Lookup Latency including key suffixes 3 1.5 2 1 GB us 1 0.5 0 0 ART tx-trie PDT ART tx-trie PDT
Fast Succinct Trie (FST) is fast and small 50M 64-bit integer keys Memory Lookup Latency including key suffixes 3 1.5 2 1 GB us 1 0.5 0 0 ART tx-trie PDT FST ART tx-trie PDT FST
Encoding Mechanism
3 ways to succinctly encode ordinal trees Ordinal tree: a rooted tree where each node can have an arbitrary # of children in order 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees ! $" ≈ 2' bits |n-node ordinal tree| = C n = " "#! 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees LOUDS: level-ordered unary degree sequence 0 110 1 2 10 110 3 4 5 1110 110 110 6 7 8 9 A B C 0 10 0 0 0 10 0 D E 0 0
3 ways to succinctly encode ordinal trees LOUDS: 110 10 110 1110 110 110 0 10 0 0 0 10 0 0 0 0 110 1 2 10 110 3 4 5 1110 110 110 6 7 8 9 A B C 0 10 0 0 0 10 0 D E 0 0
3 ways to succinctly encode ordinal trees BP: balanced parenthesis 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) ) 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees 3 0 2 BP: ( ( ( ( ) ( ( ) ) ( ) ) ) ( ( ( ) ( ) ) ( ( ( ) ) ( ) ) ) ) 8 6 D 9 A E C 7 B 4 0 1 5 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees DFUDS: depth-first unary degree sequence 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) ) 0 1 2 3 4 5 6 7 8 9 A B C D E
3 ways to succinctly encode ordinal trees DFUDS: ( ( ) ( ) ( ( ( ) ) ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ( ) ) ) 0 1 3 6 7 D 8 2 4 9 A 5 B E C 0 1 2 3 4 5 6 7 8 9 A B C D E
LOUDS-Sparse: succinctly encode tries L: f s t $ a o r r s t y p i y $ t e p f t HC: 1010 1 110 100 0 10 000 0 s S: 1001 0 101001 0 10 101 0 $ r o v 1 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: a r t y p y v 2 i s $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
LOUDS-Sparse: succinctly encode tries L: f s t $ a o r r s t y p i y $ t e p f t HC: 1010 1 110 100 0 10 000 0 s S: 1001 0 101001 0 10 101 0 $ r o v 1 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: a Why LOUDS? r t y p y v 2 i s 1. Fast tree nav. 2. Good label locality $ t e p v 3 v 4 v 5 v 6 v 7 3. Easy implementation v 8 v 9 v 10 v 11
Rank & select on bit-vectors 0 5 10 15 bv: 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 rank(bv, i) = # of 1’s in bv up to position i select(bv, i) = position of the ith 1 in bv Examples: rank(bv, 7) = 4 select(bv, 7) = 14
Compute rank & select in constant time The classic algorithm for computing rank bv
Compute rank & select in constant time The classic algorithm for computing rank !" # $ bits super block = … bv
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block =
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block cumulative rank
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block cumulative rank rank in super block
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & '( 2 *
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & ! '( 2 * " '(*
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & remaining ! bits '( 2 * " '(*
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries & & remaining O (1) time ! bits '( 2 * " '(*
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries + + remaining O (1) time ! bits () 2 * " ()* O ( % O ( % #$% ) #$% ()()*) space: o (*) O ( * ()* ()()*)
Compute rank & select in constant time The classic algorithm for computing rank #$ " % bits super block = … … … bv ! "#$% bits basic block = per super block per basic block within super block cumulative rank rank in super block all possible queries + + remaining O (1) time ! bits () 2 * " ()* O ( % O ( % #$% ) #$% ()()*) space: o (*) O ( * ()* ()()*) Select is similar but trickier, often based on rank structures
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s parent(i) = select(S, rank(S, i)-1) value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Tree navigation relies on rank & select 0 5 10 15 L: f s t $ a o r r s t y p i y $ t e p f t s HC: 1010 1 110 100 0 10 000 0 S: 1001 0 101001 0 10 101 0 $ r o v 1 a v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 V: child(i) = select(S, rank(HC, i)+1) r t y p y v 2 i s value(i) = i - rank(HC, i) $ t e p v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11
Recommend
More recommend