scaling log structured kv stores
play

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky - PowerPoint PPT Presentation

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan Log-Structured KV-Stores Log-Structured KV-Stores Why Log-Structured KV-Stores? Why Log-Structured KV-Stores? fast writes Why Log-Structured


  1. Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

  2. Log-Structured KV-Stores

  3. Log-Structured KV-Stores

  4. Why Log-Structured KV-Stores?

  5. Why Log-Structured KV-Stores? fast writes

  6. Why Log-Structured KV-Stores? memory storage

  7. Why Log-Structured KV-Stores?

  8. Why Log-Structured KV-Stores?

  9. Why Log-Structured KV-Stores? byte -addressable block -addressable

  10. write data

  11. write data

  12. write data

  13. In-Place Writes write data

  14. In-Place Writes B-trees write data

  15. In-Place Writes B-trees write data

  16. Log-Structured Writes

  17. Log-Structured Writes buffer writes

  18. Log-Structured Writes buffer writes

  19. Log-Structured Writes buffer writes

  20. Log-Structured Writes buffer writes

  21. Log-Structured Writes buffer writes

  22. Log-Structured KV-Stores fast writes buffer writes

  23. Log-Structured KV-Stores fast writes fast reads massive data

  24. Background

  25. Background buffer The Log-Structured Merge-Tree

  26. Background buffer LSM-tree

  27. buffer

  28. writes buffer

  29. key value pairs buffer

  30. key value Sherlock: a fictional detective Waldo: an inconspicuous traveler buffer

  31. buffer gets full

  32. level buffer sort & flush 0 1

  33. level buffer sort & flush … sorted runs 0 1

  34. 0 buffer 1 sort-merge 2

  35. level 0 buffer exponentially increasing capacities o n e 1 level 1 I / O p e r r u n level 2 2 level 3 3

  36. where’s level Waldo 0 buffer b i n a 1 r y s e a r c h i n g 2 3

  37. where’s level Waldo 0 buffer pointers o n e 1 I / O p e r r u n 2 3

  38. where’s level Waldo Bloom 0 buffer pointers filters 1 2 3

  39. where’s level Waldo Bloom 0 buffer pointers filters true 1 negative 2 3

  40. where’s level Waldo Bloom 0 buffer pointers filters true 1 negative false 2 positive 3

  41. where’s level Waldo Bloom 0 buffer pointers filters true 1 negative false 2 positive true 3 positive

  42. Bloom 0 buffer pointers filters merging frequency 1 2 3

  43. merging writes reads

  44. merging writes reads

  45. merging Leveling Tiering write-optimized read-optimized

  46. Leveling Tiering read-optimized write-optimized

  47. Leveling Tiering read-optimized write-optimized gather

  48. Leveling Tiering read-optimized write-optimized gather merge & flush

  49. Leveling Tiering read-optimized write-optimized gather

  50. Leveling Tiering read-optimized write-optimized gather merge

  51. Leveling Tiering read-optimized write-optimized gather merge flush

  52. Leveling Tiering read-optimized write-optimized gather merge

  53. Leveling Tiering read-optimized write-optimized log R ( N )

  54. Leveling Tiering read-optimized write-optimized 1 run per level R runs per level log R ( N ) size ratio

  55. Leveling Tiering read-optimized write-optimized 1 run per level R runs per level log R ( N ) size ratio

  56. Leveling Tiering read-optimized write-optimized 1 run per level R runs per level size ratio R

  57. Leveling Tiering read-optimized write-optimized 1 run per level 1 run per level size ratio R

  58. Leveling Tiering read-optimized write-optimized 1 run per level T runs per level size ratio R

  59. Leveling Tiering read-optimized write-optimized O(l Nl ) runs per level 1 run per level sorted log array size ratio R

  60. log Tiering Leveling sorted array

  61. log Tiering size ratio R Leveling sorted array

  62. log Tiering size ratio R Leveling sorted array

  63. R log Tiering size ratio R Leveling sorted R array

  64. Monkey Dostoevsky

  65. M onkey: O ptimal N avigable Key -Value Store SIGMOD17

  66. M onkey: O ptimal N avigable Key -Value Store SIGMOD17 Niv Dayan Manos Athanassoulis 
 Stratos Idreos

  67. M onkey: O ptimal N avigable Key -Value Store SIGMOD17 Bloom data filters

  68. Bloom data bits/entry filters x x x

  69. Bloom data bits/entry filters x x x

  70. false Bloom data positive rate filters O(e -x ) O(e -x ) O(e -x )

  71. false Bloom positive rate filters O(e -x ) O( e -x · log R ( N )) I/O O(e -x ) = O(e -x )

  72. false Bloom positive rate filters O(e -x ) O( e -x · log R ( N )) I/O O(e -x ) = O(e -x )

  73. false Bloom positive rate filters O(e -x ) O(e -x ) O(e -x ) most memory

  74. false Bloom positive rate filters O(e -x ) O(e -x ) O(e -x ) most memory saves at most 1 I/O!

  75. reallocate

  76. reallocate

  77. same memory - fewer false positives reallocate

  78. relax false positive rates 0 < p 0 < 1 0 < p 1 < 1 0 < p 2 < 1

  79. model relax read false positive rates = f( p 0 , p 1 …) cost 0 < p 0 < 1 0 < p 1 < 1 memory = f( p 0 , p 1 …) footprint 0 < p 2 < 1

  80. model relax L read ∑ false positive rates = p i cost 1 0 < p 0 < 1 0 < p 1 < 1 L memory T L − i ⋅ ln( p i ) N ∑ = − ln(2) 2 footprint 0 < p 2 < 1 i

  81. model relax optimize L read ∑ false positive rates = p i cost 1 0 < p 0 < 1 0 < p 1 < 1 L memory T L − i ⋅ ln( p i ) N ∑ = in terms of p 0 , p 1 … − ln(2) 2 footprint 0 < p 2 < 1 i

  82. false positive rate p 0 ≈ O( e -x / R 2 ) p 1 ≈ O( e -x / R 1 ) O( e -x / R 0 ) p 2 ≈

  83. false positive rate geometric O( e -x /R 2 ) progression = O(e - x ) I/O O( e -x /R 1 ) O( e -x /R 0 )

  84. O( e -x · log R ( N )) > O( e - x ) I/O

  85. O( e -x · log R ( N )) O( e - x ) I/O

  86. O( e -x · log R ( N )) read latency (ms) RocksDB Monkey O( e - x ) I/O number of entries (log scale)

  87. Existing Monkey

  88. Existing Monkey Dostoevsky

  89. tiering Monkey leveling

  90. I/O overheads with leveling point long range short range writes

  91. point false positive rates O( e - x / R 2 ) exponentially O( e - x / R ) decreasing O( e - x )

  92. false positive rates O(e - x / R 2 ) O(e - x / R ) O(e - x ) largest level point

  93. point long range short range writes largest level O(e - x )

Recommend


More recommend