Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018

2 ADM IN ISTRIVIA Project #2 – Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on Wednesday October 17 th (in class) CMU 15-445/645 (Fall 2018)

3 UPCO M IN G DATABASE EVEN TS SQream DB Tech Talk → Thursday Oct 4 th @ 12:00pm → CIC 4 th Floor CMU 15-445/645 (Fall 2018)

4 Q UERY PLAN SELECT A.id, B.value The operators are arranged in a tree. FROM A, B Data flows from the leaves toward the WHERE A.id = B.id AND B.value > 100 root. p A.id, B.value The output of the root node is the result of the query. ⨝ A.id=B.id s value>100 A B CMU 15-445/645 (Fall 2018)

5 TO DAY'S AGEN DA Processing Models Access Methods Expression Evaluation CMU 15-445/645 (Fall 2018)

6 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Three approaches: → Iterator Model → Materialization Model → Vectorized / Batch Model CMU 15-445/645 (Fall 2018)

7 ITERATO R M O DEL Each query plan operator implements a next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Top-down plan processing. Also called Volcano or Pipeline Model. CMU 15-445/645 (Fall 2018)

8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 5 emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

9 ITERATO R M O DEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators will block until children emit all of their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. → Limit CMU 15-445/645 (Fall 2018)

10 M ATERIALIZATIO N M O DEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" it output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. Bottom-up plan processing. CMU 15-445/645 (Fall 2018)

11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } 3 for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 2 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } 5 FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p 4 for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } 3 for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 2 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

12 M ATERIALIZATIO N M O DEL Better for OLTP workloads because queries typically only access a small number of tuples at a time. → Lower execution / coordination overhead. Not good for OLAP queries with large intermediate results. CMU 15-445/645 (Fall 2018)

13 VECTO RIZATIO N M O DEL Like Iterator Model, each operator implements a next function. Each operator emits a batch of tuples instead of a single tuple. → The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties. CMU 15-445/645 (Fall 2018)

14 VECTO RIZATIO N M O DEL out = { } 1 for t in child.Output() : SELECT A.id, B.value out.add ( projection (t)) FROM A, B if | out |> n : emit ( out ) WHERE A.id = B.id out = { } AND B.value > 100 2 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : A.id, B.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 out = { } out = { } A B 3 for t in A : for t in B : out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2018)

14 VECTO RIZATIO N M O DEL out = { } 1 for t in child.Output() : SELECT A.id, B.value out.add ( projection (t)) FROM A, B if | out |> n : emit ( out ) WHERE A.id = B.id out = { } AND B.value > 100 2 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : A.id, B.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ A.id=B.id out = { } 4 for t in child.Output() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 out = { } out = { } A B 3 5 for t in A : for t in B : out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2018)

15 VECTO RIZATIO N M O DEL Ideal for OLAP queries → Greatly reduces the number of invocations per operator. → Allows for operators to use vectorized (SIMD) instructions to process batches of tuples. CMU 15-445/645 (Fall 2018)

16 PRO CESSIN G M O DELS SUM M ARY Iterator / Volcano → Direction: Top-Down → Emits: Single Tuple → Target: General Purpose Vectorized Materialization → Direction: Top-Down → Direction: Bottom-Up → Emits: Tuple Batch → Emits: Entire Tuple Set → Target: OLAP → Target: OLTP CMU 15-445/645 (Fall 2018)

17 ACCESS M ETH O DS SELECT A.id, B.value An access method is a way that the FROM A, B DBMS can access the data stored in a WHERE A.id = B.id AND B.value > 100 table. p → Not defined in relational algebra. A.id, B.value Three basic approaches: ⨝ A.id=B.id → Sequential Scan s → Index Scan → Multi-Index / "Bitmap" Scan value>100 A B CMU 15-445/645 (Fall 2018)

18 SEQ UEN TIAL SCAN For each page in the table: for page in table.pages: → Retrieve it from the buffer pool. for t in page.tuples: → Iterate over each tuple and check whether if evalPred (t): to include it. // Do Something! The DBMS maintains an internal cursor that tracks the last page / slot it examined. CMU 15-445/645 (Fall 2018)

19 SEQ UEN TIAL SCAN : O PTIM IZATIO N S This is almost always the worst thing that the DBMS can do to execute a query. Sequential Scan Optimizations: → Prefetching → Parallelization → Buffer Pool Bypass → Zone Maps → Late Materialization → Heap Clustering CMU 15-445/645 (Fall 2018)

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Project #2 Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.

Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

The SDP Content Attribute draft-ietf-mmusic-sdp-media-content-02.txt Jani.Hautakorpi@ericsson.com

Attribute-Based Cryptography Lecture 21 And Pairing-Based Cryptography 1 Identity-Based

MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017

Ninon Burgos CNRS Researcher Brain and Spine Institute Aramis Lab Paris, France 2 Research

Attribute-Based Signatures [Maji et al. 2008] . Users have attributes (e.g. Departmental

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data

ECS 235B, Lecture 23 March 6, 2019 March 6, 2019 ECS 235B, Foundations of Computer and

Building a model So far, we have talked about prediction , where the purpose of learning is to be

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Project #2 Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit &amp; Continue 5.

Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking

Query Processing Query Processing Steps balance &lt; 2500 ( balance ( account)) balance

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

The SDP Content Attribute draft-ietf-mmusic-sdp-media-content-02.txt Jani.Hautakorpi@ericsson.com

Attribute-Based Cryptography Lecture 21 And Pairing-Based Cryptography 1 Identity-Based

MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017

Ninon Burgos CNRS Researcher Brain and Spine Institute Aramis Lab Paris, France 2 Research

Attribute-Based Signatures [Maji et al. 2008] . Users have attributes (e.g. Departmental

The Syntax of Classes and Objects in Python Defining a Class - &quot;Inventing a Composite Data

ECS 235B, Lecture 23 March 6, 2019 March 6, 2019 ECS 235B, Foundations of Computer and

Building a model So far, we have talked about prediction , where the purpose of learning is to be

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data