Waves of misery after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , Goetz Graefe 2 1 University of Marburg 2 Google Inc. 1
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 2
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 3
Indexes: Pros & Cons • Pros • Fast lookups • Fast ordered range scans ➔ Best supported by bulk loading a perfect secondary b-tree • Cons • Maintenance cost • Robustness of performance over time 4
Creation of a Perfect B-tree Sort 1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 Building B-Tree Max Nodesize = 3 15, 41, 100 1, 5, 8 15, 17, 19 41, 50, 90 100, 120, 142 5
Subsequent Insertions on a Perfect B-tree 4, 10, 22, 60, 102, 150 Insert Batch 15, 41, 100 Max Nodesize = 3 1, 5, 8 15, 17, 19 41, 50, 90 100, 120, 142 6
Subsequent Insertions on a Perfect B-tree 4, 10, 22, 60, 102, 150 Insert Batch 15, 41, 100 Max Nodesize = 3 1, 4, 5, 8, 10 14, 17, 19, 22 41, 50, 60, 90 100, 102, 120, 142, 150 Node Split Node Split Node Split Node Split => Immediate , widespread node splits after index creation 7
Problem of Subsequent Insertions • Splits of almost all leaves within a short time period 100% • high I/O load • low buffer utilization Split • low query performance due to contention 50% 50% • Status quo database solution: Leave free space (e.g. 30%) • Oracle, SQL Server, DB2, … 8
Creation of a Perfect B-tree with free space Sort 1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 Building B-Tree Max Nodesize = 3, Utilization = 70% 8, 17, 41, 90, 120 1, 5 8, 15 17, 19 41, 50 90, 100 120, 142 9
Subsequent insertions 4, 10, 22, 60, 102, 150 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 5 8, 15 17, 19 41, 50 90, 100 120, 142 10
Subsequent insertions 4, 10, 22, 60, 102, 150 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5 8, 10, 14 17, 19, 22 41, 50, 60 90, 100, 102 120, 142, 150 10
Continuation of insertions 6, 13, 38, 55, 95, 136 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5 8, 10, 14 17, 19, 22 41, 50, 60 90, 100, 102 120, 142, 150 12
Continuation of insertions 6, 13, 38, 55, 95, 136 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5, 6 8, 10, 13, 14 17, 19, 22, 38 41, 50, 55, 60 90, 95, 100, 102 120, 136, 142, 150 Node Split Node Split Node Split Node Split Node Split Node Split => Delayed , widespread node splits after index creation 11
Limitations of the status quo • The problem of splits is merely delayed • Moreover, the problem occurs in waves 12
Problem Assessment – When does it occur? • Loading Distribution = Insert Distribution • E.g.: Hash-Keys 13
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 14
Basic Idea • Do not leave constant free space while loading Data Bulk Loading 70% 50% 90% 60% 95% 15
Basic Idea – Insert Batch Insert Batch 70% 50% 90% 60% 95% Node Split Node Split => Distributing node splits over time 16
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 17
Ideal solution for predictable splits Moving hills into valleys 18
Ideal solution (Leaf Nodes) 1 Item q B B B/2 B/2 B-5 • q B = Probability of a split after insertion 19
Ideal solution (Leaf Nodes) • Fringe Analysis: 1 • Insert-Operation: 𝑟 𝑜 ∗ 𝐽 + Ԧ 𝑜+1 𝑈 = Ԧ 𝑟 𝑜 + 1 • Goal: Ԧ 𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state 20
Ideal solution (Leaf Nodes) • Goal: Ԧ 𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state • Analyze Transition: • Formula holds for q j = 1/(j+1) 21
Ideal solution (Leaf Nodes) • Intuition: • Few full pages split immediately • Many half full pages eventually • Ideal solution • …for expected B -tree utilization • ...i.e., for Utilization of ln(2) = 69% 22
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 23
Practical Remedies – Random • While loading: Randomly pick around target utilization 80%+x nextPage() 80% 80%-y 24
Practical Remedies – Suffix Truncation • While loading: Search for shortest key within range 100% "cattle" ... nextPage() "catchweight" "catchphrase" 80% • Added compression effect 25
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 26
Experimental Evaluation – Setup • Procedure: • Records: 21 integers (84 bytes), normal distribution • Loading b-tree with 100,000 pages of 8KB • Inserting batches of 10,000 records • Workstation: • AMD Ryzen7 2700X • 16GB memory • Java indexing library XXL 27
Experimental Evaluation – Random 28
Experimental Evaluation – Buffer Utilization 29
Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 30
Waves of Misery after index creation • Loading secondary b-tree index in... • Write-intensive workloads • Loading distribution = Insert distribution • Want to achieve predictable split performance: Don't just leave constant free space in your tree nodes! Work towards starting in the steady state of the b-tree. 31
Thank you for your attention! 32
Recommend
More recommend