CSE 232A Database System Implementation Arun Kumar Topic 9: ML - PowerPoint PPT Presentation

CSE 232A   Database System Implementation Arun Kumar Topic 9: ML for RDBMSs Optional; this topic is not included for final exam! 1

ML for Systems Q: Why bother applying ML to well-studied systems issues? ❖ Jeff Dean’s rationales (from NIPS MLSys’17 keynote): ❖ Hand-crafted heuristics are pervasive but not very adaptive; data-driven ML can improve system metrics ❖ User-tunable knobs have exploded and are painful ❖ Hardware has caught up with ML/DL demands; cloud resources are cheap and widely available ❖ Automated ML simplifies use of ML for systems ❖ Also, cynically: “ML for Systems” is a hot/ controversial topic for publications! May get a lot of (not all wanted) attention! :) http://learningsys.org/nips17/assets/slides/dean-nips17.pdf 2

ML for Systems http://learningsys.org/nips17/assets/slides/dean-nips17.pdf 3

ML for an RDBMS Q: Where may ML be helpful in an RDBMS? Natural language interfaces (NLIs) Learned Query Processing and Opt. Learned Access Methods Learned Caching/ Scheduling Policies ML for Knob Tuning and Resource Management 6

ML for Knob Tuning/Resource Mgmt ❖ Motivation: Modern RDBMSs have 100s of config parameters (buffers for EMS, degree of parallelism, etc.) ❖ Mixture of continuous and discrete parameters ❖ Effects on query latency, etc. can be non-monotonic ❖ Optimal settings highly dependent on schema properties, database instance, hardware, auxiliary data structures, and query workload properties ❖ Impossible for DBAs to keep up, esp. cloud ❖ Why ML? Adapt quickly to instance/query workload/etc.; target flexibility (latency/utilization/etc.); can be more accurate ❖ “Autonomous”/“Self-driving” are the industry buzzwords 7

Example 8 https://www.cs.cmu.edu/~pavlo/papers/p1009-van-aken.pdf

Natural Language Interfaces (NLIs) ❖ Motivation: SQL is too hard for non-technical business users (sales, marketing, etc.) and lay public ❖ NLIs allow more people to exploit relational databases ❖ No need to learn complex syntax or even schema details ❖ Regular conversational style interactions ❖ Why ML? State-of-the-art in natural language processing (NLP) is DL-based; pure parsing/rule-based is too brittle ❖ Extremely challenging to automatically infer both structure and literals from NL query to translate to proper SQL! ❖ AFAIK, no robust open-domain commercial system today 9

Example 10 https://arxiv.org/pdf/1804.00401.pdf

Learned Scheduling/Caching Policies ❖ Motivation: Existing heuristic policies may not exploit data/ query distributions well and thus waste runtime ❖ Why ML? By learning the underlying data/workload distributions, ML can help reduce runtimes/resource wastage ❖ Learned schedulers: better load balancing to reduce worker idle times to improve utilization and/or latency ❖ Learned caching/buffering: better retention and eviction decisions to increase cache hits and reduce latency 11

Examples http://alexbeutel.com/papers/CIDR2019_SageDB.pdf 12 https://arxiv.org/pdf/1907.02394.pdf

Learned Access Methods ❖ Motivation: Existing access methods may be wasting some system resources (memory, storage, runtime, etc.) because they do not exploit database instance distributions ❖ Why ML? By learning/approximating the underlying data distributions, ML can help reduce resource demands ❖ Resource reduction target depends on use-case ❖ Learned index structures: reduce memory/storage footprint of index, while maintaining or reducing query latency ❖ Learned compression formats: reduce memory/storage footprint and file I/O time 13

Examples https://www.cl.cam.ac.uk/~ey204/teaching/ACS/R244_2018_2019/papers/Kraska_SIGMOD_2018.pdf 14 https://arxiv.org/pdf/1905.08898.pdf ; https://arxiv.org/pdf/1912.01668.pdf https://ieeexplore.ieee.org/document/8712659?denied=

Learned Query Processing ❖ Motivation: Existing phy. op. impl. are not exploiting database instance distributions well; can save some runtime or improve runtime predictability by doing so ❖ Why ML? By learning/approximating the underlying data distributions, ML can reduce runtimes/improve accuracy ❖ Learned sorting: the closer the distribution is to pre-sorted, the less time we can spend on sorting ❖ Learned joins: learn the distribution and location of the join attributes to reduce hash look up and/or sorting needs ❖ Learned query plans: Improve runtime predictability 15

Examples http://alexbeutel.com/papers/CIDR2019_SageDB.pdf 16 http://www.vldb.org/pvldb/vol12/p1733-marcus.pdf

Learned Query Optimizers ❖ Motivation: Existing optimizers have many heuristics (join orders, plan selection, cardinality estimation, etc.) ❖ Why ML? By learning/approximating the underlying data distributions, ML can reduce runtimes for final plan ❖ Learned join order: Use join attribute distribution info and reinforcement learning to figure better join orders ❖ Learned plan rewrites: Use database instance properties and attribute distributions to rewrite plans 17

Examples http://www.vldb.org/pvldb/vol12/p1705-marcus.pdf 18 https://arxiv.org/pdf/1808.03196.pdf

Takeaways: ML for RDBMSs Many parts of the RDBMS stack can benefit from ML/DL ML for Knob Tuning and Resource Management Natural language interfaces (NLIs) Learned Caching/Scheduling Policies Learned Access Methods Learned Query Processing and Opt. … Apart from above, note that ML is already common in other data systems settings: data integration, data cleaning, etc. Data systems will keep evolving due to evolution of hardware, cloud, and ML capabilities; stay informed of latest research! 19

Please fill out the course evaluation form Thank you for taking CSE 232A. All the best for your future endeavors!

CSE 232A Database System Implementation Arun Kumar Topic 9: ML - PowerPoint PPT Presentation

CSE 232A Database System Implementation Arun Kumar Topic 9: ML for RDBMSs Optional; this topic is not included for final exam! 1 ML for Systems Q: Why bother applying ML to well-studied systems issues? Jeff Deans rationales (from

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

CSE 232A Graduate Database Systems Arun Kumar Topic 1: Data Storage Chapters 8 and 9 of Cow

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

CSE 232A Graduate Database Systems Fall 2019 Arun Kumar 1 About Myself 2009: Bachelors in

CSE 232A Graduate Database Systems Arun Kumar Topic 2: Indexing and Sorting Chapters 10,

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews 1 Goal of Peer Review in

CSE 232A Graduate Database Systems Arun Kumar Review Discussion 1 Review Question Which

CSE 232A Graduate Database Systems Arun Kumar Topic 5: Data Integration and Cleaning Slide

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Database Cracking September 7, 2016 CSE 662 - Database Languages & Runtimes 1 Row Stores

Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

Firefox quality Mozilla Paris | FOSDEM | Feb 3rd 2018 Bonjour ! Je suis Sylvestre Ledru Je

Dynamo Dynamo motivation Fast, available writes - Shopping cart: always enable purchases FLP:

ATLAS Update : I/O Developments Peter van Gemmeren (ANL) ROOT I/O

Parallel Execution Lecture # 14 Database Systems Andy Pavlo AP AP Computer Science

Secret Management with Hashicorp's Vault Daniel Bornkessel Secret Management with Hashicorp's

Gender-diversity analysis of technical contributions LinuxCon, Berlin 2016 Daniel Izquierdo

Is Password InSecurity Inevitable? Cryptographic Enhancements to Password Protocols Hugo

Experimental Constraints on Experimental Constraints on 4th generation quark masses 4th

Sambuz

Useful Links

Newsletter

Mail Us

CSE 232A Database System Implementation Arun Kumar Topic 9: ML - PowerPoint PPT Presentation

CSE 232A Database System Implementation Arun Kumar Topic 9: ML for RDBMSs Optional; this topic is not included for final exam! 1 ML for Systems Q: Why bother applying ML to well-studied systems issues? Jeff Deans rationales (from

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

CSE 232A Graduate Database Systems Arun Kumar Topic 1: Data Storage Chapters 8 and 9 of Cow

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

CSE 232A Graduate Database Systems Fall 2019 Arun Kumar 1 About Myself 2009: Bachelors in

CSE 232A Graduate Database Systems Arun Kumar Topic 2: Indexing and Sorting Chapters 10,

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews 1 Goal of Peer Review in

CSE 232A Graduate Database Systems Arun Kumar Review Discussion 1 Review Question Which

CSE 232A Graduate Database Systems Arun Kumar Topic 5: Data Integration and Cleaning Slide

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

Database Cracking September 7, 2016 CSE 662 - Database Languages &amp; Runtimes 1 Row Stores

Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages &amp; Runtimes

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

Firefox quality Mozilla Paris | FOSDEM | Feb 3rd 2018 Bonjour ! Je suis Sylvestre Ledru Je

Dynamo Dynamo motivation Fast, available writes - Shopping cart: always enable purchases FLP:

ATLAS Update : I/O Developments Peter van Gemmeren (ANL) ROOT I/O

Parallel Execution Lecture # 14 Database Systems Andy Pavlo AP AP Computer Science

Secret Management with Hashicorp's Vault Daniel Bornkessel Secret Management with Hashicorp's

Gender-diversity analysis of technical contributions LinuxCon, Berlin 2016 Daniel Izquierdo

Is Password InSecurity Inevitable? Cryptographic Enhancements to Password Protocols Hugo

Experimental Constraints on Experimental Constraints on 4th generation quark masses 4th

Sambuz

Useful Links

Newsletter

Mail Us

Database Cracking September 7, 2016 CSE 662 - Database Languages & Runtimes 1 Row Stores

Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes