data analytics using deep learning
play

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y D A T A B A S E S Y S T E M S administrivia Project ideas List shared on Piazza Start


  1. DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y D A T A B A S E S Y S T E M S

  2. administrivia • Project ideas – List shared on Piazza – Start looking for team-mates! – Sign up for discussion slots during office hours GT 8803 // Fall 2019 2

  3. LAST CLASS • History of DBMSs – In a way though, it really was a history of data models • Data Models – Hierarchical data model (tree) (IMS) – Network data model (graph) (CODASYL) – Relational data model (tables) (System R, INGRES) • Overarching theme about all these systems – They were all disk-based DBMSs GT 8803 // Fall 2019 3

  4. TODAY’s AGENDA • Disk-centric DBMSs • In-Memory DBMSs GT 8803 // Fall 2019 4

  5. DISK-CENTRIC DBMSs 5 GT 8803 // Fall 2018

  6. ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System GT 8803 // Fall 2019 6

  7. ANATOMY OF A DATABASE SYSTEM • Process Manager – Manages client connections • Query Processor – Parse, plan and execute queries on top of storage manager • Transactional Storage Manager – Knits together buffer management, concurrency control, logging and recovery • Shared Utilities – Manage hardware resources across threads GT 8803 // Fall 2019 7

  8. TOPICS • Implications of availability of large DRAM chips for database systems – Buffer Management – Query Processing – Concurrency Control – Logging and Recovery GT 8803 // Fall 2019 8

  9. BACKGROUND • Much of the history of DBMSs is about dealing with the limitations of hardware. • Hardware was much different when the original DBMSs were designed: – Uniprocessor (single-core CPU) – RAM was severely limited (few MB). – The database had to be stored on disk. – Disk is slow. No seriously, I mean really slow. GT 8803 // Fall 2019 9

  10. BACKGROUND • But now DRAM capacities are large enough that most databases can fit in memory. – Structured data sets are smaller (e.g., tables with numeric data). – Unstructured data sets are larger (e.g., videos). • So why not just use a "traditional" disk- oriented DBMS with a really large cache? GT 8803 // Fall 2019 10

  11. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 11 GT 8803 // Fall 2018

  12. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions BUFFER POOL LATCHING LOCKING LOGGING B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 12 GT 8803 // Fall 2018

  13. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions BUFFER POOL LATCHING LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 13 GT 8803 // Fall 2018

  14. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions BUFFER POOL 14% LATCHING LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 14 GT 8803 // Fall 2018

  15. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 15 GT 8803 // Fall 2018

  16. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING 12% LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 16 GT 8803 // Fall 2018

  17. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING 12% LOCKING LOGGING 16% 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 17 GT 8803 // Fall 2018

  18. DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING 12% LOCKING LOGGING 16% 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, 7% AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 18 GT 8803 // Fall 2018

  19. bUFFER MANAGEMENT • The primary storage location of the database is on non-volatile storage (e.g., SSD). – The database is stored in a file as a collection of fixed-length blocks called slotted pages on disk. • The system uses an volatile in-memory buffer pool to cache blocks fetched from disk. – Its job is to manage the movement of those blocks back and forth between disk and memory. GT 8803 // Fall 2019 19

  20. bUFFER MANAGEMENT • When a query accesses a page, the DBMS checks to see if that page is already in memory in a buffer pool – If it’s not, then the DBMS has to retrieve it from disk and copy it into a free frame in the buffer pool. – If there are no free frames, then find a page to evict guided by the page replacement policy . – If the page being evicted is dirty, then the DBMS has to write it back to disk to ensure the durability (ACI D ) of data. GT 8803 // Fall 2019 20

  21. bUFFER MANAGEMENT • Page replacement policy is a differentiating factor between open-source and commercial DBMSs. – What kind of data does it contain? – Is the page dirty? – How likely is the page to be accessed in the near future? – Examples: LRU, LFU, CLOCK, ARC GT 8803 // Fall 2019 21

  22. bUFFER MANAGEMENT • Once the page is in memory, the DBMS translates any on-disk addresses to their in- memory addresses. (Page Identifier) (Page Pointer) [#100] [0x5050] GT 8803 // Fall 2019 22

  23. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Slotted Pages 23 GT 8803 // Fall 2018

  24. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 24 GT 8803 // Fall 2018

  25. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 25 GT 8803 // Fall 2018

  26. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 26 GT 8803 // Fall 2018

  27. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 27 GT 8803 // Fall 2018

  28. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 28 GT 8803 // Fall 2018

  29. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 29 GT 8803 // Fall 2018

  30. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 30 GT 8803 // Fall 2018

  31. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page1 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 31 GT 8803 // Fall 2018

  32. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page1 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 32 GT 8803 // Fall 2018

  33. bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page1 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 33 GT 8803 // Fall 2018

  34. bUFFER MANAGEMENT • Every tuple access has to go through the buffer pool manager regardless of whether that data will always be in memory. – Always have to translate a tuple’s record id to its memory location. – Worker thread has to pin pages that it needs to make sure that they are not swapped to disk. GT 8803 // Fall 2019 34

  35. BUFFER MANAGEMENT GT 8803 // Fall 2019 35

  36. BUFFER MANAGEMENT • Q: What do we gain by managing an in- memory buffer? – A: Accelerate query processing by storing frequently-accessed pages in fast memory • Q: Can we “learn” an optimal page replacement policy? – A: Recent paper from Google on learning memory accesses based on LSTM models. GT 8803 // Fall 2019 36

  37. BUFFER MANAGEMENT • Q: What do we gain by managing an in- memory buffer? – A: Accelerate query processing by storing frequently-accessed pages in fast memory • Q: Can we “learn” an optimal page replacement policy? – A: Recent paper from Google on learning memory accesses based on LSTM models. GT 8803 // Fall 2019 37

  38. BUFFER MANAGEMENT • Q: What do we gain by managing an in- memory buffer? – A: Accelerate query processing by storing frequently-accessed pages in fast memory • Q: Can we “learn” an optimal page replacement policy? – A: Recent paper from Google on learning memory accesses based on LSTM models. GT 8803 // Fall 2019 38

Recommend


More recommend