design tradeoffs of data access methods
play

Design Tradeoffs of Data Access Methods Manos Athanassoulis and - PowerPoint PPT Presentation

Design Tradeoffs of Data Access Methods Manos Athanassoulis and Stratos Idreos declarative interface ask what you want the system decides how to best store and access data db system applications api/sql


  1. Design Tradeoffs of Data Access Methods Manos Athanassoulis and Stratos Idreos

  2. declarative interface ask ‘’what’’ you want the system decides “how” to best store and access data db system

  3. applications api/sql algorithms/operators cpu data data memory hierarchy data data system kernel: a collection of access methods

  4. an access method is a way to store and access data layout structure navigation

  5. an access method is a way to store and access data layout e.g., array structure unordered navigation scan

  6. an access method is a way to store and access data layout e.g., array e.g., array structure unordered ordered navigation scan binary search

  7. TREES HASH TABLES SLOTTED PAGES COLUMN-GROUPS TRIES COLUMNS ARRAYS LOG-STRUCTURED TREES MULTI-DIMENTIONAL

  8. isn’t this a solved problem?

  9. isn’t this a solved problem? access method design is now as important as ever

  10. 2.5 data dai grows y 2 [IB today data systems are nearly everywhere… continuous need for new and tailored data systems

  11. 2.5 data dai grows y 2 [IB today data systems are nearly everywhere… continuous need for new and tailored data systems tomorrow

  12. 2.5 data dai grows y 2 [IB today data systems are nearly everywhere… continuous need for new and tailored data systems tomorrow

  13. disk memory A B C D

  14. disk memory A BC option1 row-store A B C D engine

  15. disk memory X X X A BC option1 row-store A B C D engine A option2 column- store engine

  16. how many more new access methods to design?

  17. how many more new access methods to design? it is not about radical new designs only! design, tuning and variations

  18. say the workload (read/write ratio) shifts (e.g., due to app features): should we use a different data layout for base data - diff updates? should we use different indexing or no indexing?

  19. say the workload (read/write ratio) shifts (e.g., due to app features): should we use a different data layout for base data - diff updates? should we use different indexing or no indexing? say we buy new hardware X (flash/memory): should we change the size of b-tree nodes? should we change the merging strategy in our LSM-tree?

  20. say the workload (read/write ratio) shifts (e.g., due to app features): should we use a different data layout for base data - diff updates? should we use different indexing or no indexing? say we buy new hardware X (flash/memory): should we change the size of b-tree nodes? should we change the merging strategy in our LSM-tree? say we want to improve response time: would it be beneficial if we would buy faster flash disks? would it be beneficial if we buy more memory?

  21. conflicting goals moving target (hardware and requirements change continuously and rapidly) application requirements budget performance hardware energy profile

  22. move from design based on intuition & experience only to a more formal and systematic way to design systems

  23. goals and structure of the tutorial structure design space & tradeoffs highlight open problems towards easy to design methods

  24. goals and structure of the tutorial structure design space & tradeoffs highlight open problems towards easy to design methods basic tradeoffs goals & vision ~30 min [slides available at daslab.seas.harvard.edu] design ~40 min space

  25. target audience = beginner to expert no new designs but new connections & structure

  26. NOT JUST SQL + operating systems, no sql, sciences

  27. hardware is a big drive of access method (re)design (and it continuously evolves)

  28. CPU faster registers ~1ns on chip cache ~10ns SRAM on board cache memory wall memory DRAM ~100ns disk cheaper it is not just memory and disk we want to move as few data items as possible all the way up to the CPU

  29. random access & page-based access need to only read x … but have to read all of page 1 data value x … page1 page2 page3

  30. what is the perfect access method?

  31. what is the perfect access method? no single answer; it depends

  32. what is the perfect access method? no single answer; it depends what is the application read patterns write patterns reads/writes ratios hardware (CPU, memory, etc) SLAs

  33. a perfect access method for reads (point queries) find(x) oracle x

  34. a perfect access method for reads (point queries) find(x) reads oracle updates memory x

  35. a perfect access method for reads (point queries) find(x) reads oracle updates memory x

  36. a perfect access method for reads (point queries) find(x) reads oracle updates memory x

  37. a perfect access method for reads (point queries) find(x) reads oracle updates memory x

  38. a perfect access method for reads (point queries) but with no memory overhead binary search to find(x) sorted

  39. a perfect access method for reads (point queries) but with no memory overhead reads updates binary search to find(x) memory sorted

  40. a perfect access method for reads (point queries) but with no memory overhead reads updates binary search to find(x) memory sorted

  41. a perfect access method for reads (point queries) but with no memory overhead reads updates binary search to find(x) memory sorted

  42. a perfect access method for reads (point queries) but with no memory overhead reads updates binary search to find(x) memory sorted

  43. a perfect access method for writes (point writes) update(x) x x x update log

  44. a perfect access method for writes (point writes) reads updates update(x) memory x x x update log

  45. a perfect access method for writes (point writes) reads updates update(x) memory x x x update log

  46. a perfect access method for writes (point writes) reads updates update(x) memory x x x update log

  47. a perfect access method for writes (point writes) reads updates update(x) memory x x x update log

  48. design space it all starts with how we store data every bit matters

  49. basic tradeoffs R eads U pdates M emory RUM conjecture, EDBT 2016

  50. Read R eads min max min min U pdates Update Memory M emory

  51. Read R eads min max min min U pdates Update Memory M emory read-op(mized update & memory memory-op(mized op-mized max max max min min min min min min

  52. Fractional Partitioning Cascading Fractional Log-structured Cascading Updates Differential Logarithmic Sparse Updates Design Indexing study basic access methods design components how they affect the RUM tradeoffs how are they combined in existing access methods Read min max min min Update Memory

  53. Fractional Partitioning Cascading Fractional Log-structured Cascading Updates Differential Logarithmic Sparse Updates Design Indexing study basic access methods design components how they affect the RUM tradeoffs how are they combined in existing access methods Read min Part 2 max min min Update Memory

  54. can we make it easy to design/tune access methods?

  55. … disk memory flash 1 easily utilize past concepts

  56. 35 28 # of citations 21 14 7 P. O’Neil, E. Cheng, D. Gawlick, E, O'Neil The log-structured merge-tree (LSM-tree) Acta Informatica 33 (4): 351–385, 1996 0 1996 1999 2002 2005 2008 2011 2014 2 do not miss out on cool ideas and concepts

  57. 35 28 # of citations 21 Google publishes BigTable 14 7 P. O’Neil, E. Cheng, D. Gawlick, E, O'Neil The log-structured merge-tree (LSM-tree) Acta Informatica 33 (4): 351–385, 1996 0 1996 1999 2002 2005 2008 2011 2014 2 do not miss out on cool ideas and concepts

  58. move from design based on intuition & experience only to a more formal and systematic way to design systems

  59. construct access methods out of basic components (and their tradeoffs) e.g., scan*, tree*, bloom filters, bitmaps, hash tables, etc.

  60. data system designer INTERACTIVE DATA SYSTEM DESIGN/TUNING/TESTING

  61. possible opportunities once we have a “complete” & navigable set of design modules learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts

  62. possible opportunities once we have a “complete” & navigable set of design modules learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts easy to change/adapt easy to design

  63. possible opportunities once we have a “complete” & navigable set of design modules learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts easy to change/adapt easy to design universal development platform testing

  64. possible opportunities once we have a “complete” & navigable set of design modules learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts easy to change/adapt easy to design discovery of universal new combinations development of design options platform testing

  65. Part 2: observe how papers fill in gaps in the structure and existing open gaps

Recommend


More recommend