after index creation
play

after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , - PowerPoint PPT Presentation

Waves of misery after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , Goetz Graefe 2 1 University of Marburg 2 Google Inc. 1 Outline Problem Assessment Basic Solution Ideal Solution Practical Remedies Experimental


  1. Waves of misery after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , Goetz Graefe 2 1 University of Marburg 2 Google Inc. 1

  2. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 2

  3. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 3

  4. Indexes: Pros & Cons • Pros • Fast lookups • Fast ordered range scans ➔ Best supported by bulk loading a perfect secondary b-tree • Cons • Maintenance cost • Robustness of performance over time 4

  5. Creation of a Perfect B-tree Sort 1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 Building B-Tree Max Nodesize = 3 15, 41, 100 1, 5, 8 15, 17, 19 41, 50, 90 100, 120, 142 5

  6. Subsequent Insertions on a Perfect B-tree 4, 10, 22, 60, 102, 150 Insert Batch 15, 41, 100 Max Nodesize = 3 1, 5, 8 15, 17, 19 41, 50, 90 100, 120, 142 6

  7. Subsequent Insertions on a Perfect B-tree 4, 10, 22, 60, 102, 150 Insert Batch 15, 41, 100 Max Nodesize = 3 1, 4, 5, 8, 10 14, 17, 19, 22 41, 50, 60, 90 100, 102, 120, 142, 150 Node Split Node Split Node Split Node Split => Immediate , widespread node splits after index creation 7

  8. Problem of Subsequent Insertions • Splits of almost all leaves within a short time period 100% • high I/O load • low buffer utilization Split • low query performance due to contention 50% 50% • Status quo database solution: Leave free space (e.g. 30%) • Oracle, SQL Server, DB2, … 8

  9. Creation of a Perfect B-tree with free space Sort 1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 Building B-Tree Max Nodesize = 3, Utilization = 70% 8, 17, 41, 90, 120 1, 5 8, 15 17, 19 41, 50 90, 100 120, 142 9

  10. Subsequent insertions 4, 10, 22, 60, 102, 150 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 5 8, 15 17, 19 41, 50 90, 100 120, 142 10

  11. Subsequent insertions 4, 10, 22, 60, 102, 150 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5 8, 10, 14 17, 19, 22 41, 50, 60 90, 100, 102 120, 142, 150 10

  12. Continuation of insertions 6, 13, 38, 55, 95, 136 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5 8, 10, 14 17, 19, 22 41, 50, 60 90, 100, 102 120, 142, 150 12

  13. Continuation of insertions 6, 13, 38, 55, 95, 136 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5, 6 8, 10, 13, 14 17, 19, 22, 38 41, 50, 55, 60 90, 95, 100, 102 120, 136, 142, 150 Node Split Node Split Node Split Node Split Node Split Node Split => Delayed , widespread node splits after index creation 11

  14. Limitations of the status quo • The problem of splits is merely delayed • Moreover, the problem occurs in waves 12

  15. Problem Assessment – When does it occur? • Loading Distribution = Insert Distribution • E.g.: Hash-Keys 13

  16. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 14

  17. Basic Idea • Do not leave constant free space while loading Data Bulk Loading 70% 50% 90% 60% 95% 15

  18. Basic Idea – Insert Batch Insert Batch 70% 50% 90% 60% 95% Node Split Node Split => Distributing node splits over time 16

  19. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 17

  20. Ideal solution for predictable splits Moving hills into valleys 18

  21. Ideal solution (Leaf Nodes) 1 Item q B B B/2 B/2 B-5 • q B = Probability of a split after insertion 19

  22. Ideal solution (Leaf Nodes) • Fringe Analysis: 1 • Insert-Operation: 𝑟 𝑜 ∗ 𝐽 + Ԧ 𝑜+1 𝑈 = Ԧ 𝑟 𝑜 + 1 • Goal: Ԧ 𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state 20

  23. Ideal solution (Leaf Nodes) • Goal: Ԧ 𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state • Analyze Transition: • Formula holds for q j = 1/(j+1) 21

  24. Ideal solution (Leaf Nodes) • Intuition: • Few full pages split immediately • Many half full pages eventually • Ideal solution • …for expected B -tree utilization • ...i.e., for Utilization of ln(2) = 69% 22

  25. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 23

  26. Practical Remedies – Random • While loading: Randomly pick around target utilization 80%+x nextPage() 80% 80%-y 24

  27. Practical Remedies – Suffix Truncation • While loading: Search for shortest key within range 100% "cattle" ... nextPage() "catchweight" "catchphrase" 80% • Added compression effect 25

  28. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 26

  29. Experimental Evaluation – Setup • Procedure: • Records: 21 integers (84 bytes), normal distribution • Loading b-tree with 100,000 pages of 8KB • Inserting batches of 10,000 records • Workstation: • AMD Ryzen7 2700X • 16GB memory • Java indexing library XXL 27

  30. Experimental Evaluation – Random 28

  31. Experimental Evaluation – Buffer Utilization 29

  32. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 30

  33. Waves of Misery after index creation • Loading secondary b-tree index in... • Write-intensive workloads • Loading distribution = Insert distribution • Want to achieve predictable split performance: Don't just leave constant free space in your tree nodes! Work towards starting in the steady state of the b-tree. 31

  34. Thank you for your attention! 32

Recommend


More recommend