12 Part I Intro to Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Execution 12 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019

2 ADM IN ISTRIVIA Homework #3 is due Wed Oct 9 th @ 11:59pm Mid-Term Exam is Wed Oct 16th @ 12:00pm Project #2 is due Sun Oct 20 th @ 11:59pm CMU 15-445/645 (Fall 2019)

3 Q UERY PLAN SELECT R.id, S.cdate The operators are arranged in a tree. FROM R JOIN S ON R.id = S.id Data flows from the leaves of the tree WHERE S.value > 100 p up towards the root. R.id, S.value The output of the root node is the ⨝ R.id=S.id result of the query. s value>100 R S CMU 15-445/645 (Fall 2019)

4 TO DAY'S AGEN DA Processing Models Access Methods Expression Evaluation CMU 15-445/645 (Fall 2019)

5 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model CMU 15-445/645 (Fall 2019)

6 ITERATO R M O DEL Each query plan operator implements a Next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Also called Volcano or Pipeline Model. CMU 15-445/645 (Fall 2019)

7 ITERATO R M O DEL SELECT R.id, S.cdate Next() for t in child.Next() : FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 Next() for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id Next() for t in child.Next() : s if evalPred (t): emit (t) value>100 Next() Next() for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : Single Tuple s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 5 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

8 ITERATO R M O DEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators have to block until their children emit all of their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. CMU 15-445/645 (Fall 2019)

9 M ATERIALIZATIO N M O DEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" its output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. → Can send either a materialized row or a single column. The output can be either whole tuples (NSM) or subsets of columns (DSM) CMU 15-445/645 (Fall 2019)

10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) All Tuples value>100 return out out = [ ] out = [ ] R S 3 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s 4 if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S 3 5 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

11 M ATERIALIZATIO N M O DEL Better for OLTP workloads because queries only access a small number of tuples at a time. → Lower execution / coordination overhead. → Fewer function calls. Not good for OLAP queries with large intermediate results. CMU 15-445/645 (Fall 2019)

12 VECTO RIZATIO N M O DEL Like the Iterator Model where each operator implements a Next function in this model. Each operator emits a batch of tuples instead of a single tuple. → The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties. CMU 15-445/645 (Fall 2019)

13 VECTO RIZATIO N M O DEL out = [ ] 1 for t in child.Next() : SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S if | out |> n : emit ( out ) ON R.id = S.id out = [ ] 2 WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ R.id=S.id out = [ ] for t in child.Next() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 Tuple Batch out = [ ] out = [ ] 3 R S for t in R: for t in S: out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2019)

13 VECTO RIZATIO N M O DEL out = [ ] 1 for t in child.Next() : SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S if | out |> n : emit ( out ) ON R.id = S.id out = [ ] 2 WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ R.id=S.id out = [ ] 4 for t in child.Next() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 Tuple Batch out = [ ] out = [ ] 3 5 R S for t in R: for t in S: out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2019)

14 VECTO RIZATIO N M O DEL Ideal for OLAP queries because it greatly reduces the number of invocations per operator. Allows for operators to use vectorized (SIMD) instructions to process batches of tuples. CMU 15-445/645 (Fall 2019)

12 Part I Intro to Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Execution 12 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019 2 ADM IN ISTRIVIA Homework #3 is due Wed Oct 9 th @ 11:59pm Mid-Term Exam is Wed Oct 16th @ 12:00pm

Semantic Patches for specifying and automating Collateral Evolutions Yoann Padioleau Ecole des

Asynchronous processing of proof documents rethinking interactive theorem proving Makarius

Dropwizard A RESTful Love Story Ryan Kennedy (@rckenned)

Introduction to Abstract Data Types Introduction to Abstract Data Types Abstract Data Type (ADT)

Wha What is s a modul dule? Technically, it is any file containing Python code Can

Repeated Arguments 7 January 2019 OSU CSE 1 Sources of Aliasing Aliased references for

Bounded independence plus noise fools products Chin Ho Lee Northeastern University Elad

Verification Techniques for Object Invariants Peter Mller Microsoft Research Joint work with

Atheist Day What Type of Fool Are You? The Fool Proper The fool says in his heart, There

Scientific Computing I Conclusion and Outlook Michael Bader Lehrstuhl Informatik V Winter

Adversarial Regression with Multiple Learners Liang Tong 1 Sixie Yu 1 Scott Alfeld 2

CSE 610 Special Topics: System Security - Attack and Defense for Binaries Instructor: Dr. Ziming

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT & Cloud

Words & their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li,

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A

Outline Introduction Dynamic Symbolic Execution Binsec/SE Demo CEA - - 2/11 Introduction The

Joey Stanley University of Georgia joeystan@uga.edu @joey_stan joeystanley.com Special thanks

Robustness and geometry of deep neural networks Alhussein Fawzi DeepMind May 23rd 2019 The

Agile Adoption and Parenting Max Keeler September 28, 2009 The Goal 2 September 28, 2009

Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification Lianhui

Making Default Address Selection More Robust FoolProof

Fool Proof Luke 24:13-35 April 1, 2018 1957 Swiss Spaghetti Harvest 1976 Zero-G Day

12 Part I Intro to Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Execution 12 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019 2 ADM IN ISTRIVIA Homework #3 is due Wed Oct 9 th @ 11:59pm Mid-Term Exam is Wed Oct 16th @ 12:00pm

Semantic Patches for specifying and automating Collateral Evolutions Yoann Padioleau Ecole des

Asynchronous processing of proof documents rethinking interactive theorem proving Makarius

Dropwizard A RESTful Love Story Ryan Kennedy (@rckenned)

Introduction to Abstract Data Types Introduction to Abstract Data Types Abstract Data Type (ADT)

Wha What is s a modul dule? Technically, it is any file containing Python code Can

Repeated Arguments 7 January 2019 OSU CSE 1 Sources of Aliasing Aliased references for

Bounded independence plus noise fools products Chin Ho Lee Northeastern University Elad

Verification Techniques for Object Invariants Peter Mller Microsoft Research Joint work with

Atheist Day What Type of Fool Are You? The Fool Proper The fool says in his heart, There

Scientific Computing I Conclusion and Outlook Michael Bader Lehrstuhl Informatik V Winter

Adversarial Regression with Multiple Learners Liang Tong 1 Sixie Yu 1 Scott Alfeld 2

CSE 610 Special Topics: System Security - Attack and Defense for Binaries Instructor: Dr. Ziming

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT &amp; Cloud

Words &amp; their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li,

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A

Outline Introduction Dynamic Symbolic Execution Binsec/SE Demo CEA - - 2/11 Introduction The

Joey Stanley University of Georgia joeystan@uga.edu @joey_stan joeystanley.com Special thanks

Robustness and geometry of deep neural networks Alhussein Fawzi DeepMind May 23rd 2019 The

Agile Adoption and Parenting Max Keeler September 28, 2009 The Goal 2 September 28, 2009

Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification Lianhui

Making Default Address Selection More Robust FoolProof

Fool Proof Luke 24:13-35 April 1, 2018 1957 Swiss Spaghetti Harvest 1976 Zero-G Day

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT & Cloud

Words & their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan