Query Optimization Database Management Systems 3ed, R. Ramakrishnan - PDF document

Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string) � Similar to old schema; rname added for variations. � Reserves: � Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. � Sailors: � Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 Motivating Example (On-the-fly) SELECT S.sname Plan: sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND rating > 5 (On-the-fly) bid=100 R.bid=100 AND S.rating>5 (Page Nested Loops) RA Tree: sname sid=sid rating > 5 bid=100 Reserves Sailors � Cost: 500+500*1000 I/Os sid=sid � By no means the worst plan! � But can do better (how?) Reserves Sailors Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

sname (On-the-fly) Alternative Plans 1 (No Indexes) (Sort-Merge Join) sid=sid (Scan; (Scan; rating > 5 write to write to bid=100 temp T2) temp T1) � Main difference: push selects. Reserves Sailors � With 5 buffers, cost of plan: � Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). � Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). � Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250) � Total: 3560 page I/Os. � If we used BNL join, join cost = 10+4*250, total cost = 2770. � If we `push’ projections, T1 has only sid , T2 only sid and sname : � T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Alternative Plans 2 sname (On-the-fly) With Indexes (On-the-fly) rating > 5 � With clustered index on bid of (Index Nested Loops, Reserves, we get 100,000/100 = sid=sid with pipelining ) 1000 tuples on 1000/100 = 10 pages. (Use hash index; do bid=100 Sailors � INL with pipelining (outer is not not write result to materialized). temp) Reserves –Projecting out unnecessary fields from outer doesn’t help. � Join column sid is a key for Sailors. –At most one matching tuple, unclustered index on sid OK. � Decision not to push rating>5 before the join is based on availability of sid index on Sailors. � Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Overview of Query Optimization � Plan : Tree of R.A. ops, with choice of alg for each op. � Each operator typically implemented using a `pull ’ interface: when an operator is `pulled ’ for the next output tuples, it `pulls ’ on its inputs and computes them. � Two main issues: � For a given query, what plans are considered? • Algorithm to search plan space for cheapest (estimated) plan. � How is the cost of a plan estimated? � Ideally: Want to find best plan. Practically: Avoid worst plans! � We will study the System R approach. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

Outline � Relational algebra equivalences � Statistics and size estimation � Plan enumeration and cost estimation � Nested queries Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 Relational Algebra Equivalences � Allow us to choose different join orders and to `push ’ selections and projections ahead of joins. ( ) ( ) ( ) � Selections : ( Cascade ) σ ≡ σ σ R ... R ∧ ∧ c 1 ... cn c 1 cn ( ) ( ) ( ) ( ) σ σ ≡ σ σ ( Commute ) R R c 1 c 2 c 2 c 1 ( ) ( ) ( ) ( ) π ≡ π π � Projections: (Cascade) R ... R a 1 a 1 an ≡ (Associative) � Joins: R (S T) (R S) T > < > < > < > < ≡ (Commute) (R S) (S R) > < > < ≡ R (S T) (T R) S � Show that: > < > < > < > < Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8 More Equivalences � A projection commutes with a selection that only uses attributes retained by the projection. � Selection between attributes of the two arguments of a cross-product converts cross-product to a join. � A selection on just attributes of R commutes with ≡ R S. (i.e., (R S) (R) S ) σ σ > < > < > < � Similarly, if a projection follows a join R S, we can > < `push ’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

Outline � Relational algebra equivalences � Statistics and size estimation � Plan enumeration and cost estimation � Nested queries Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 Example Plan sname (On-the-fly) (Sort-Merge Join) sid=sid (Scan; (Scan; write to bid=100 rating > 5 write to temp T1) temp T2) Reserves Sailors Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Statistics and Catalogs � Need information about the relations and indexes involved. Catalogs typically contain at least: � # tuples (NTuples) and # pages (NPages) for each relation. � # distinct key values (NKeys) and NPages for each index. � Index height, low/high key values (Low/High) for each tree index. � Catalogs updated periodically. � Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. � More detailed information (e.g., histograms of the values in some field) are sometimes stored. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Example Plan sname (On-the-fly) (Sort-Merge Join) sid=sid (Scan; (Scan; write to write to bid=100 rating > 5 temp T2) temp T1) Reserves Sailors Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Size Estimation and Reduction Factors SELECT attribute list FROM relation list � Consider a query block: WHERE term1 AND ... AND termk � What is maximum # tuples possible in result? � Reduction factor (RF) associated with each term reflects the impact of the term in reducing result size. Result cardinality = Max # tuples * product of all RF’s. � Implicit assumption that terms are independent! � Term col=value has RF 1/NKeys(I), given index I on col � Term col1=col2 has RF 1/ MAX (NKeys(I1), NKeys(I2)) � Term col>value has RF (High(I)-value)/(High(I)-Low(I)) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14 Reduction Factors & Histograms � For better estimation, use a histogram No. of Values 2 3 3 1 8 2 1 equiwidth Value 0-.99 1-1.99 2-2.99 3-3.99 4-4.99 5-5.99 6-6.99 No. of Values 2 3 3 3 3 2 4 Value 0-.99 1-1.99 2-2.99 3-4.05 4.06-4.67 4.68-4.99 5-6.99 equidepth Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

Outline � Relational algebra equivalences � Statistics and size estimation � Plan enumeration and cost estimation � Nested queries Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16 Enumeration of Alternative Plans � There are two main cases: � Single-relation plans � Multiple-relation plans � For queries over a single relation, queries consist of a combination of selects, projects, and aggregate ops: � Each available access path (file scan / index) is considered, and the one with the least estimated cost is chosen. � Pipelined to other selections, projections, aggregates. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Queries Over Multiple Relations � Fundamental decision in System R: only left-deep join trees are considered. D D C C A B C D B A A B Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

Query Optimization Database Management Systems 3ed, R. Ramakrishnan - PDF document

Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@dataegret.com)

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

SAMPLING-BASED QUERY RE- OPTIMIZATION Wentao Wu Microsoft Research Background 2 Query

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Normal Forms 340151 Big Databases & Cloud Services (P. Baumann) 1 The Evils of Redundancy

Schema Design and Normal Forms Database Management Systems, 2 nd Edition. R. Ramakrishnan and J.

Empirics of Executive Compensation: What Determines CEO Pay? Stanimir Morfov (HSE, Moscow)

Wage progression of low-educated workers Philippe Aghion Antonin Bergeaud Richard Blundell

Set-difference () Tuples in reln.1,butnotin reln.2. Union

State Prof. Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,

Example 1.11 I Fig 1.12 September 13, 2020 1 / 12 Example 1.11 II s a b q 1 a r 1 b a a

Outline The Simple Linear Regression Model (12.1) Fitting the Regression Line (12.2)

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Query Optimization Database Management Systems 3ed, R. Ramakrishnan - PDF document

Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@dataegret.com)

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

SAMPLING-BASED QUERY RE- OPTIMIZATION Wentao Wu Microsoft Research Background 2 Query

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Normal Forms 340151 Big Databases &amp; Cloud Services (P. Baumann) 1 The Evils of Redundancy

Schema Design and Normal Forms Database Management Systems, 2 nd Edition. R. Ramakrishnan and J.

Empirics of Executive Compensation: What Determines CEO Pay? Stanimir Morfov (HSE, Moscow)

Wage progression of low-educated workers Philippe Aghion Antonin Bergeaud Richard Blundell

Set-difference () Tuples in reln.1,butnotin reln.2. Union

State Prof. Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,

Example 1.11 I Fig 1.12 September 13, 2020 1 / 12 Example 1.11 II s a b q 1 a r 1 b a a

Outline The Simple Linear Regression Model (12.1) Fitting the Regression Line (12.2)

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Normal Forms 340151 Big Databases & Cloud Services (P. Baumann) 1 The Evils of Redundancy