Query Processing and Optimization Rose-Hulman Institute of - PowerPoint PPT Presentation

Query Processing and Optimization Rose-Hulman Institute of Technology Curt Clifton

Outline  Basic Optimization Approach  Algorithms for Processing Queries  Pipelining  Techniques for Automatic Query Optimization

Introduction to Query Processing  What is query optimization?  Typically intermediate form is a query tree

From SQL to Relational Algebra  Query block : the basic unit that can be translated into the algebraic operators and optimized  Nested queries become separate query blocks  Aggregate operators in SQL require extended algebra  Example…

Example Translation SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > ( SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5); SELECT MAX (SALARY) SELECT LNAME, FNAME FROM EMPLOYEE FROM EMPLOYEE WHERE DNO = 5 WHERE SALARY > C π LNAME, FNAME ( σ SALARY> C (EMPLOYEE)) ℱ MAX SALARY ( σ DNO=5 (EMPLOYEE))

What Next?  Queries reduced to query trees in relational algebra  DBMS considers various algorithms for processing query  Rewrites tree to use “best” algorithms  Variety of algorithms exist to solve various query problems

Problem: Sorting Huge Datasets  Use external sorting  Phase 1: Load n pages into memory, as many as fit (a “run”)  Sort them and save back to disk  Repeat until all runs are sorted   Phase 2: Perform an ( n -1)-way merge  One page for “top” of each of n-1 runs  One page for “bottom” of merge results  Repeat until done 

Problem: Selecting Subset of Rows  Linear search:  Last resort, unless file is small  Binary search:  For ordered data without an index  Using an index for equality comparisons:  Just look up the record

Problem: Selecting Subset of Rows  Using a primary index for order comparisons:  Find edge of range using index  Scan from there  Using a secondary index for order comparisons:  Find edge of range using index  Scan leaf nodes of index from there, loading data based on pointers

Select With Complex Condition  Simple conjunctive selection:  Pick one condition for which some previous method would work  Use brute force to filter those results based on other conditions  Conjunctive selection with a composite index:  Works if index covers all attributes in the complex condition

Select With Complex Condition  Conjunctive selection by intersection of record pointers:  Suppose: Secondary indexes are several fields in condition  Indexes include record pointers   Then: Use indexes to get sets of the record pointers for  conjuncts Take intersection of pointer sets  Then retrieve actual records 

Problem: Joining Two Tables  Nested-loop join (brute force):  Last resort unless tables are small  Single-loop join when one table has index  Loop over one table  Use index to find matches in other table

Problem: Joining Two Tables  Sort-merge join when both tables sorted by join attributes  Scan both files matching the records that have the same values for join attributes

Problem: Combining Multiple Ops.  Generating and saving temporary files is time expensive  So, avoid constructing temporary results  Pipeline the data through multiple operations:  Pass the result of a previous operator to the next  Page-by-page instead of operation-by-operation  Example…

Pipelining Example  SELECT (FName + ' ' + LName) AS Name FROM Employee e JOIN Department d ON e.DNo = d.DNumber WHERE e.Salary < 50000 AND d.Location <> 'Houston'  What are the individual operations for this?  How many ways could this be pipelined?

Picking Algorithms and Plans  Heuristics  Cost estimation

Using Heuristics  Uses pattern matching to transform parts of query tree to a “best” shape  Patterns based on transformations that are likely to be more efficient:  E.g., Apply selection before applying join  Why is that likely (naively) to be more efficient?

Cost-based Optimization  Estimate the costs of a variety of different versions of the query based on:

Cost-based Optimization  Estimate the costs of a variety of different versions of the query based on:  Available indexes  Specificity of conditions  Statistics on data  Disk speed  Memory available  Block and record sizes  Index blocking factors

Issues in Cost-based Optimization  Accuracy of statistics  Cost of calculating costs  Accuracy of estimates of disk speed, memory available  Shear number of possible execution strategies

Which is Used?  Cost-based optimization is “taking over”  SQL Server uses cost-based optimization  Does NOT try to minimize total cost!

Which is Used?  Cost-based optimization is “taking over”  SQL Server uses cost-based optimization  Does NOT try to minimize total cost!  Tries to minimize time to initial results

Query Processing and Optimization Rose-Hulman Institute of - PowerPoint PPT Presentation

Query Processing and Optimization Rose-Hulman Institute of Technology Curt Clifton Outline Basic Optimization Approach Algorithms for Processing Queries Pipelining Techniques for Automatic Query Optimization Introduction to

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Data Management Systems Query Processing Execution models Views and Schemas

PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@dataegret.com)

Steps in Query Processing 1. Translation check SQL syntax check existence of relations and

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.

Traffic Analysis Using Streaming Queries Mike Fisk Los Alamos National Laboratory

Phase Geography June 2014 Susan Cohen Part 1: NCS/CAPS comparison Curriculum dimensions

IS TOMORROWS TECHNOLOGY Dr. Mike Lazaridis Co-Founder and Managing Partner of Quantum Valley

Reproducible Quantum Chemistry in JupyterLab Chris Harris (Kitware) @openchem Overview

Building Dynamic Market Places Using HyperQueries Christian Wiesner Peter Winklhofer Alfons

Multi-layer Active Queue Management and Congestion Control for Scalable Video Streaming Seong

Office of Supportive Housing Overview of Santa Clara County Homeless Services Lauren Gavin, LMFT

3. Renewables Status Update 31 March 2018 Renewable Generation Status +238 MW Committed 2

Query Processing and Optimization Rose-Hulman Institute of - PowerPoint PPT Presentation

Query Processing and Optimization Rose-Hulman Institute of Technology Curt Clifton Outline Basic Optimization Approach Algorithms for Processing Queries Pipelining Techniques for Automatic Query Optimization Introduction to

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Data Management Systems Query Processing Execution models Views and Schemas

PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@dataegret.com)

Steps in Query Processing 1. Translation check SQL syntax check existence of relations and

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit &amp; Continue 5.

Traffic Analysis Using Streaming Queries Mike Fisk Los Alamos National Laboratory

Phase Geography June 2014 Susan Cohen Part 1: NCS/CAPS comparison Curriculum dimensions

IS TOMORROWS TECHNOLOGY Dr. Mike Lazaridis Co-Founder and Managing Partner of Quantum Valley

Reproducible Quantum Chemistry in JupyterLab Chris Harris (Kitware) @openchem Overview

Building Dynamic Market Places Using HyperQueries Christian Wiesner Peter Winklhofer Alfons

Multi-layer Active Queue Management and Congestion Control for Scalable Video Streaming Seong

Office of Supportive Housing Overview of Santa Clara County Homeless Services Lauren Gavin, LMFT

3. Renewables Status Update 31 March 2018 Renewable Generation Status +238 MW Committed 2

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.