class 14 log structured merge trees
play

Class 14: Log-Structured-Merge Trees Instructor: Manos Athanassoulis - PowerPoint PPT Presentation

CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis CS460: Intro to Database Systems Class 14: Log-Structured-Merge Trees Instructor: Manos Athanassoulis https://midas.bu.edu/classes/CS460/ based on slides from Niv


  1. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree Design principle #1: optimize for insertions by buffering Design principle #2: optimize for lookups by sort-merging arrays Inserts Level sort & flush buffer Buffer 0 1 … X 2 … X 1 ... … 2 … X 2 ... … … Sorted Sort-merge & arrays 3 Eliminate duplicates & Discard original arrays

  2. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example Level Buffer 0 1 2 Sorted arrays 3

  3. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 4 6 9 1 2 Sorted arrays 3

  4. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level sort & flush buffer Buffer 0 1 4 6 9 2 Sorted arrays 3

  5. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 1 4 6 9 2 Sorted arrays 3

  6. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 3 4 8 1 4 6 9 2 Sorted arrays 3

  7. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level sort & flush buffer Buffer 0 1 4 6 9 3 4 8 2 Sorted arrays 3

  8. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 1 4 6 9 3 4 8 2 Sorted arrays 3

  9. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 1 4 6 9 3 4 8 2 3 4 6 8 9 Sorted Sort-merge arrays 3

  10. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 1 3 4 2 8 4 1 6 9 2 3 4 2 6 8 9 Sorted Sort-merge & arrays 3 Eliminate duplicates

  11. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 1 4 6 9 3 4 8 2 3 4 6 8 9 Sorted Sort-merge & arrays 3 Eliminate duplicates & Discard original arrays

  12. INSTITUTE FOR APPLIED CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis COMPUTATIONAL SCIENCE Basic LSM-tree – Example inserts Level Buffer 0 1 2 3 4 6 8 9 Sorted arrays 3

  13. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 2 7 8 1 2 3 4 6 8 9 Sorted arrays 3

  14. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level sort & flush buffer Buffer 0 1 2 7 8 2 3 4 6 8 9 Sorted arrays 3

  15. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Example inserts Level Buffer 0 1 2 7 8 2 3 4 6 8 9 Sorted arrays 3

  16. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree Levels have exponentially increasing capacities. Level Capacity Buffer 0 … … … 1 1 ... ... ... 2 2 4 ... ... ... ... ... … Sorted arrays 3 ... ... ... ... ... … ... ... ... ... ... … 8

  17. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Lookup cost & & Lookup method? Search youngest to oldest. Search youngest to oldest. O log % O log % ' ' & & How? Binary search. Binary search. O log % O log % ' ' % % & & Lookup cost? O log % O log % ' ' Level Capacity Buffer 0 … … … 1 1 ... ... ... 2 2 4 ... ... ... ... ... … Sorted arrays 3 ... ... ... ... ... … ... ... ... ... ... … 8

  18. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Basic LSM-tree – Insertion cost & How many times is each entry copied? O log % ' O 1 What is the price of each copy? ' O 1 & Total insert cost? ' ) log % ' Level Capacity Buffer 0 … … … 1 1 ... ... ... 2 2 4 ... ... ... ... ... … Sorted arrays 3 ... ... ... ... ... … ... ... ... ... ... … 8

  19. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue Lookup cost Insertion cost Sorted array O(log 2 (N/B)) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  20. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue Better insert cost and worst lookup cost compared with B-trees Lookup cost Insertion cost Sorted array O(log 2 (N/B)) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  21. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue Better insert cost and worst lookup cost compared with B-trees Can we improve lookup cost? Lookup cost Insertion cost Sorted array O(log 2 (N/B)) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  22. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Declining Main Memory Cost

  23. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Declining Main Memory Cost Store a fence pointer for every block in main memory Fence 10 15 … 1 pointers Block 1 Block 2 Block 3 … 1 10 15 … array 3 11 16 … 6 13 18 …

  24. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(log 2 (N/B)) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  25. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(log 2 (N/B)) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  26. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  27. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  28. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(log B (N/B)) O(log B (N/B)) O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Basic LSM-tree Leveled LSM-tree Tiered LSM-tree

  29. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(1) O(1) Basic LSM-tree O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Leveled LSM-tree Tiered LSM-tree

  30. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(1) O(1) Basic LSM-tree O(log 2 (N/B) 2 ) O(1/B ! log 2 (N/B)) Leveled LSM-tree Tiered LSM-tree

  31. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(1) O(1) Basic LSM-tree O(log 2 (N/B)) O(1/B ! log 2 (N/B)) Leveled LSM-tree Tiered LSM-tree

  32. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Quick sanity check: suppose N = 2 42 and B = 2 10 Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(1) O(1) Basic LSM-tree O(log 2 (N/B)) O(1/B ! log 2 (N/B)) Leveled LSM-tree Tiered LSM-tree

  33. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Quick sanity check: suppose N = 2 42 and B = 2 10 Lookup cost Insertion cost Sorted array O(1) O(2 32 ) Log O(2 32 ) O(2 -10 ) B-tree O(1) O(1) O(2 -10 ! 5) Basic LSM-tree O(5) Leveled LSM-tree Tiered LSM-tree

  34. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost Update cost

  35. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? Capacity Level Buffer T 0 0 … … … T 1 1 T 2 2 Sorted arrays T 3 3

  36. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 Capacity Level Buffer 1 0 … … … 4 1 16 2 Sorted arrays 64 3

  37. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level Buffer 1 0 … … … 4 1 16 2 Sorted arrays 64 3

  38. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level flush Buffer 1 0 … … … … … … 4 1 16 2 Sorted arrays 64 3

  39. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level flush & sort-merge Buffer 1 0 … … … … … … … … … 4 1 16 2 Sorted arrays 64 3

  40. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level flush & sort-merge Buffer 1 0 … … … … … … … … … … … … 4 1 16 2 Sorted arrays 64 3

  41. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level flush & sort-merge Buffer 1 0 … … … … … … … … … … … … … … … 4 1 16 2 Sorted arrays 64 3

  42. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level Buffer 1 0 … … … … … … … … … … … … … … … 4 1 move 16 2 Sorted arrays 64 3

  43. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost depends on number of levels Increase size ratio T How to reduce it? E.g. size ratio of 4 inserts Capacity Level Buffer 1 0 … … … 4 1 16 2 … … … … … … … … … … … … Sorted arrays 64 3

  44. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost? Insertion cost? & % & O log % O ' ( log % ' ' inserts Capacity Level Buffer 1 0 … … … 4 1 16 2 … … … … … … … … … … … … Sorted arrays 64 3

  45. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Leveled LSM-tree Lookup cost? Insertion cost? & % & O log % O ' ( log % ' ' What happens as we increase the size ratio T? What happens when size ratio T is set to be N/B? Lookup cost becomes: Insert cost becomes: O(1) O(N/B 2 ) The LSM-tree becomes a sorted array!

  46. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Lookup cost Basic LSM-tree Sorted L e v e l i n g array Insertion cost

  47. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(1) O(1) Basic LSM-tree O(log 2 (N/B)) O(1/B ! log 2 (N/B)) Leveled LSM-tree O(log T (N/B)) O(T/B ! log T (N/B)) Tiered LSM-tree

  48. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Lookup cost Insertion cost

  49. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. Capacity Level Buffer T 0 0 … … … T 1 1 T 2 2 Sorted arrays T 3 3

  50. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 Capacity Level Buffer 1 0 … … … 4 1 16 2 Sorted arrays 64 3

  51. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 inserts Capacity Level flush Buffer 1 0 … … … … … … 4 1 16 2 Sorted arrays 64 3

  52. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 inserts Capacity Level flush Buffer 1 0 … … … … … … … … … 4 1 16 2 Sorted arrays 64 3

  53. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 inserts Capacity Level flush Buffer 1 0 … … … … … … … … … … … … 4 1 16 2 Sorted arrays 64 3

  54. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 inserts Capacity Level flush Buffer 1 0 … … … … … … … … … … … … … … … 4 1 16 2 Sorted arrays 64 3

  55. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 inserts Capacity Level Buffer 1 0 … … … … … … … … … … … … … … … 4 1 16 2 … … … … … … … … … … … … Sorted arrays sort-merge 64 3

  56. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Reduce the number of levels by increasing the size ratio. Do not merge within a level. E.g. size ratio of 4 inserts Capacity Level Buffer 1 0 … … … 4 1 16 2 … … … … … … … … … … … … Sorted arrays 64 3

  57. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Lookup cost? Insertion cost? ( * ( O " # log ' O ) # log ' ) ) inserts Capacity Level Buffer 1 0 … … … 4 1 16 2 … … … … … … … … … … … … Sorted arrays 64 3

  58. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Tiered LSM-tree Lookup cost? Insertion cost? ) " ) O * $ log ( O # $ log ( # # What happens as we increase the size ratio T? What happens when size ratio T is set to be N/B? Lookup cost becomes: Insert cost becomes: O(N/B) O(1/B) The tiered LSM-tree becomes a log!

  59. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Log Tiering Lookup cost Basic LSM-tree Sorted L e v e l i n g array Insertion cost

  60. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Results Catalogue – with fence pointers Lookup cost Insertion cost Sorted array O(1) O(N/B) Log O(N/B) O(1/B) B-tree O(1) O(1) Basic LSM-tree O(log 2 (N/B)) O(1/B ! log 2 (N/B)) Leveled LSM-tree O(log T (N/B)) O(T/B ! log T (N/B)) Tiered LSM-tree O(T ! log T (N/B)) O(1/B ! log T (N/B))

  61. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Bloom filters

  62. CAS CS 460 [Fall 2019] - https://midas.bu.edu/classes/CS460/ - Manos Athanassoulis Declining Main Memory Cost

Recommend


More recommend