Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES Huanchen Zhang Hu Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo
Fi Filters answer approximate membership queries 2
Fi Filters answer approximate membership queries Bi Billionaire 2
Fi Filters answer approximate membership queries Bi Billionaire 2
Fi Filters answer approximate membership queries Bi Billionaire No No False Ne Negatives YE YES, 100% 2
Fi Filters answer approximate membership queries Bi Billionaire 2
Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire 2
Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire YE YES, 1% 2
Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire YE YES, 1% 2
Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire YE YES, 1% Fa False Positive Ra Rate 2
Filters pr Fi pre-re reject mo most t negati tive queries Queries Qu Lo Local Memory Sl Slow Devices 3
Filters pr Fi pre-re reject mo most t negati tive queries Qu Queries Local Memory Lo NO NO Pro robably YES Slow Devices Sl 3
Ex Existing filters only support point filtering Point Filteri ring SELECT * FROM Billionaire res WHER WH ERE E La LastName = = ‘Pa Pavlo’ Bloom Filter (1 Bl (1970) Quotient Filter (2 Qu (2012) Cuckoo Filter (2 Cu (2014) 4
Existing filters only support point filtering Ex Point Filteri ring Range Filteri ring SELECT * FROM Billionaire res SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’ WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’ Bloom Filter (1 Bl (1970) Qu Quotient Filter (2 (2012) Cu Cuckoo Filter (2 (2014) 4
Existing filters only support point filtering Ex Point Filteri ring Range Filteri ring SELECT * FROM Billionaire res SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’ WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’ Bloom Filter (1 Bl (1970) Qu Quotient Filter (2 (2012) Cu Cuckoo Filter (2 (2014) 4
Ou Our solution: Su Succinct Range Filters (Su SuRF) Firs rst pra ractical, genera ral-purp rpose ra range filter SMALL: SM clo lose to theoretic minimum rate: ≈ 12 64 64-bit integer r keys, 1% false positive ra 12 bi bits per r key FA FAST: com omparable to o fastest trees r keys: ≈ 200 10 0 million 64-bit integer 00 ns ns per r query ry USEFUL US UL: ev evaluated ed in Ro RocksDB speed up ra range queri ries by up to 5x 5x 5
St Starting point: a complete tr trie S I G K O M D O P D D S 6
St Starting point: a complete tr trie S I TOO BI TO BIG G K O M D O P D D S 6
Make it smaller: a truncated tr Ma trie S S I I G G K O K O M M D O P D D S 7
Make it smaller: a truncated tr Ma trie S S I I G G K O K O M M D O P SI SIGMOD OD SI SIGMET ETRICS D D S 7
Us Use suffix bits to reduce fa false positive rate Hashed Suffix Bits Ha Re Real Su Suffix Bits S S I I G G K O K O M M 0x 0xC8 0x20 0x 0x06 0x 06 D O P 8
Us Use suffix bits to reduce fa false positive rate Ha Hashed Suffix Bits Real Su Re Suffix Bits SIGM SI GMETRICS S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P 8
Us Use suffix bits to reduce fa false positive rate Hashed Suffix Bits Ha Real Su Re Suffix Bits SI SIGM GMETRICS S S I I G G K O K O M M 0x 0xC8 0x 0x20 0x06 0x 06 D O P 0x18 0x 8
Use suffix bits to reduce fa Us false positive rate Ha Hashed Suffix Bits Real Su Re Suffix Bits SIGM SI GMETRICS SIGM SI GMETRICS S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x 0x06 06 D O P 0x 0x18 E 8
Us Use suffix bits to reduce fa false positive rate Hashed Suffix Bits Ha Re Real Su Suffix Bits S S I I G G K O K O M M 0x 0xC8 0x20 0x 0x06 0x 06 D O P 8
Use suffix bits to reduce fa Us false positive rate Hashed Suffix Bits Ha Re Real Su Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x 0x06 06 D O P Each bit re reduces FPR by half 8
Use suffix bits to reduce fa Us false positive rate Hashed Suffix Bits Ha Real Su Re Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P Each bit re reduces FPR by half Ca Cannot help ra range queri ries 8
Use suffix bits to reduce fa Us false positive rate Ha Hashed Suffix Bits Re Real Su Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P Each bit re reduces FPR by half Be Benefit point & ra range queri ries Cannot help ra Ca range queri ries 8
Use suffix bits to reduce fa Us false positive rate Ha Hashed Suffix Bits Real Su Re Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P Each bit re reduces FPR by half Be Benefit point & ra range queri ries Ca Cannot help ra range queri ries Weaker r distinguishability 8
Su Succinct Data St Structure … … us uses an an am amount of spac ace that at is “close” to the inform rmation-theore retic lower r bound, but still allows efficient query ry opera rations. [wi wikipedia] 9
Su SuRF’s en encodin ing is is small and fast Sm Small ≈ 10 10 + suffix bi bits pe per key for 64-bi bit in integers ≈ 14 14 + suffix bi bits pe per key for emails Fa Fast Ma Matches st state-of of-th the-ar art po pointer-ba based trees 10 10
Bloom filters speed up point queries in Ro Bl RocksDB Cached Filters Ca B, B, B, B, B, B, … L N-2 SST SSTable …, 6, 20, …, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 11 11
Bloom filters speed up point queries in Ro Bl RocksDB Ca Cached Filters GET(16) GE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 11 11
Bloom filters speed up point queries in Ro Bl RocksDB NO NO Ca Cached Filters GET(16) GE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, …, 12, 12, 21, 21, … B L N …, …, 11, 11, 19 19, … B 11 11
Bloom filters can’t help range queries in Ro Bl RocksDB Ca Cached Filters SEEK(14, 18) SE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 12 12
Bloom filters can’t help range queries in Ro Bl RocksDB Ca Cached Filters SEEK(14, 18) SE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 12 12
SuRFs ca Su can benefit both point and range queries Ca Cached Filters S, , S, , S, , … L N-2 …, …, 6, 20, 0, … S L N-1 …, 12, …, 12, 21, 21, … S L N …, …, 11, 11, 19 19, … S 13 13
SuRFs ca Su can benefit both point and range queries NO NO GET(16) GE Ca Cached Filters SEEK(14, 18) SE S, , S, , S, , … L N-2 …, …, 6, 20, 0, … S L N-1 …, …, 12, 12, 21, 21, … S L N …, …, 11, 11, 19 19, … S 13 13
Ev Evaluation setup: a time-se series s benchmark Time Ti Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload 14 14
Ev Evaluation setup: a time-se series s benchmark Qu Queries: SEEK(t 1 , SE , t 2 ) GET(t) GE t t 1 t 2 Ti Time Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Value: 1K Va 1KB pa payload 14 14
Ev Evaluation setup: a time-se series s benchmark Queries: Qu SEEK(t 1 , SE , t 2 ) GE GET(t) t t 1 t 2 Time Ti Key: 64 Ke 64-bi bit timestamp p + + 64-bit sensor r ID Value: 1K Va 1KB pa payload Sys System Co Config set: ≈ 100 Datase Da 00 GB on SSD DR DRAM: 32 32 GB 14 14
Evaluation setup: a time-se Ev series s benchmark Queries: Qu SE SEEK(t 1 , , t 2 ) GET(t) GE t t 1 t 2 Ti Time Key: 64 Ke 64-bi bit timestamp p + + 64-bit sensor r ID Value: 1K Va 1KB pa payload System Co Sys Config Filter Co Fi Config set: ≈ 100 Bloom filter: r: 14 bits per r key Da Datase 00 GB on SSD Su SuRF: 4-bit re real suffix DR DRAM: 32 32 GB 14 14
Su SuRFs st still benefit point queries s in Ro RocksDB All-false point queri Al ries ghput (Kops/s) 40 40 30 Worst Wo st-ca case Gap 20 20 Through 10 10 Th 0 No Filter No Bloom Filter Bl Su SuRF 15 15
Su SuRFs sp speed up range queries s in Ro RocksDB 10 10 SuRF Su ghput (Kops/s) 8 6 4 Through No Filter/ No 2 Bloom Fi Bl Filter Th 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 99 99 Percent of queries with empty results Pe 16 16
Recommend
More recommend