Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen PARALLEL DATA LABORATORY Carnegie Mellon University
2
3
4
Part I Initial Exploration of Hybrid Indexes [SIGMOD’16] 5
You are running out of memory 6
You are running out of memory 6
? Buy more You are running out of memory 6
TPC-C on -Store Memory Limit = 5GB Throughput 60K 20K 8M 10M 0 2M 4M 6M Transactions Executed Memory (GB) 8 Disk tuples 4 In-memory tuples Indexes 0 7
8
The better way: Use memory more efficiently 9
Indexes are LARGE Hybrid Index Benchmark % space for index 58% 34% TPC-C 55% 41% Voter 34% 18% Articles 10
Our Contributions [SIGMOD’16] The hybrid index architecture The Dual-Stage Transformation Applied to 4 index structures - B+tree - Skip List - Masstree - Adaptive Radix Tree (ART) Performance Space 30 – 70% 11
Did we solve this problem? -Store Throughput (txn/s) 60K 20K 8M 10M 0 2M 4M 6M TPC-C on Stay tuned Transactions Executed 12
How do hybrid indexes achieve memory savings ? Static 13
Hybrid Index: a dual-stage architecture dynamic stage static stage 14
Inserts are batched in the dynamic stage write merge dynamic stage static stage 15
Reads search the stages in order dynamic stage static stage 16
A Bloom filter improves read performance read dynamic stage static stage 17
Memory-efficient Skew-aware read write merge ~ ~ ~ ~ ~ ~ ~ ~ dynamic stage static stage 18
The Dual-Stage Transformation merge dynamic stage static stage 19
The Dual-Stage Transformation merge dynamic stage static stage 19
The Dynamic-to-Static Rules Compaction Reduction Compression 20
The Dynamic-to-Static Rules Compaction Reduction Compression 20
4 2 4 6 8 10 11 12 1 2 5 5 5 6 7 8 9 10 3 4 g h i j k l m n a b c d e f 21
Compaction: minimize # of memory blocks 4 2 4 6 8 10 11 12 1 2 5 5 5 6 7 8 9 10 3 4 g h i j k l m n a b c d e f 21
Compaction: minimize # of memory blocks 3 6 9 1 2 3 7 8 9 10 11 12 4 5 6 l m n a b c d h i j k e f g 21
Reduction: minimize structural overhead 3 6 9 10 11 12 1 2 3 7 8 9 4 5 6 a b c i j k l m n d h e f g 22
Reduction: minimize structural overhead 3 6 9 1 2 3 7 8 9 4 5 6 10 11 12 a b c d h i j k l m n e f g 22
Reduction: minimize structural overhead 4 3 6 9 2 4 6 8 10 1 2 3 7 8 9 4 5 6 10 11 12 a b c d h i j k l m n 11 12 1 2 5 5 5 6 7 8 9 10 3 4 e f g g h i j k l m n a b c d e f 22
The merge routine is a blocking process merge dynamic stage static stage 23
The merge routine is a blocking process ? Size % merge dynamic stage static stage 23
Did we solve this problem? B+tree -Store Throughput (txn/s) 60K 20K 8M 10M 0 2M 4M 6M TPC-C on Transactions Executed 24
Yes, we improved the DBMS’s capacity! B+tree -Store Throughput (txn/s) 60K 20K 8M 10M 0 2M 4M 6M TPC-C on Hybrid 60K 20K Transactions Executed 24
Throughput (txn/s) B+tree 60K -Store 20K Hybrid 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25
Throughput (txn/s) B+tree 60K -Store 20K Hybrid 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25
Throughput (txn/s) B+tree 60K -Store 20K Hybrid 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25
Throughput (txn/s) B+tree 60K -Store 20K Hybrid 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25
Throughput (txn/s) B+tree 60K -Store 20K Hybrid 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25
Throughput (txn/s) B+tree 60K Take Away: -Store 20K Higher Memory saved Larger working Hybrid throughput by indexes set in memory 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25
Part I Recap The hybrid index architecture GENERAL The Dual-Stage Transformation PRACTICAL Applied to 4 index structures USEFUL - B+tree - Skip List - Masstree - Adaptive Radix Tree (ART) 26
Part II Concurrent hybrid indexes with non- blocking merge 27
Building Concurrent Hybrid Index? merge write dynamic stage static stage 28
Building Concurrent Hybrid Index? merge write dynamic stage static stage 28
Use concurrent data structures for dynamic-stage merge write dynamic stage static stage 29
Static-stage is perfectly concurrent by default merge write dynamic stage static stage 30
Challenge: efficient non-blocking merge algorithm merge write dynamic stage static stage 31
Merge Algorithm Requirements Non-blocking - All existing items are accessible during merge - New items can still enter Efficient - Fast - Bounded temporary memory use 32
Naïve Solution 1: Coarse-grained Locking merge write dynamic stage static stage 33
Naïve Solution 1: Coarse-grained Locking merge write dynamic stage static stage 33
The intermediate stage unblocks write traffic merge write dynamic stage static stage 34
The intermediate stage unblocks write traffic merge freeze write dynamic stage static stage Intermediate stage 34
The intermediate stage unblocks write traffic merge freeze write dynamic stage static stage Intermediate stage 34
How do we unblock reads during merge? merge static stage Intermediate stage 35
Naïve Solution 2: Full Copy-on-write merge static stage Intermediate stage 36
Key Observation Merged-in items in the static-stage will NOT be accessed until the intermediate-stage is deleted Merge Incrementally! 37
Our Solution: Incremental Copy-on-write with Rapid GC parent new old 38
Our Solution: Incremental Copy-on-write with Rapid GC parent When can we safely reclaim the garbage? new old 38
Our Solution: Incremental Copy-on-write with Rapid GC parent When can we safely reclaim the garbage? new old 38
Our Solution: Incremental Copy-on-write with Rapid GC parent When no thread still holds a reference to it! new old 38
Our Solution: Incremental Copy-on-write with Rapid GC Thread-local counters C n C 1 C 2 C 3 parent When no thread still holds a reference to it! new old 38
Our Solution: Incremental Copy-on-write with Rapid GC Thread-local counters C max C max C min C n C 1 C 2 C 3 ++C i = MAX(C i , C max ) + 1 parent GC Condition: When no thread still C min > garbage tag holds a reference to it! new old 38
A Quick Recap of the Merge Algorithm The intermediate stage separates writes from the merge process The incremental merge algorithm with rapid GC is non-blocking and space-efficient 39
What we are building now Non-blocking Compact Radix Tree Merge 40
What we are building now Non-blocking Compact Radix Tree Merge 40
What we are building now Non-blocking Compact Radix Tree Merge 40
What we are building now Non-blocking Compact Bwtree Radix Tree Merge 40
What we are building now Non-blocking Compact Skiplist Radix Tree Merge 40
What we are building now Non-blocking Compact Masstree Radix Tree Merge 40
Part III Super-compact static-stage 41
Go “crazy” on space-efficiency Succinct Data Structures - Z + o(Z), where Z is the information-theoretic lower bound - Still allow for efficient query operations 100011010000101… rank 1 (x) = # of 1’s up to position x select 1 (x) = position of the x-occurrence of 1 42
Encoding Radix Tree a a 0 10 $ a b 1000 $ab 100 $ l n r $ a $lnr$a 100010 10000100 $ i i o $ 100101010 $iio$ 10001 i $ n i$n 010 101010 $ $ 1010 $$ 11 43
Memory Savings with the New Encoding 1000 50M email keys with average Memory (MB) 800 length = 20 bytes 600 84% 400 200 0 ART Our Encoding 44
The Takeaway Message Hybrid indexes can save the precious memory resources with minimum performance penalty. 45
Toll-Free Hotline: 1-844-88-CMUDB 44
Back-up Slides
Latency (ms) B+tree Hybrid 50% 10 10 99% 50 52 MAX 115 611
Recommend
More recommend