MariaDB Optimizer in 10.3, where does it stand? Santa Clara, - PowerPoint PPT Presentation

MariaDB Optimizer in 10.3, where does it stand? Santa Clara, California | April 23th – 25th, 2018 Santa Clara, California | April 23th – 25th, 2018 Sergey Petrunia MariaDB Project Sergey Petrunia MariaDB Project Vicen iu Ciorbaru MariaDB Foundation ț Vicen iu Ciorbaru MariaDB Foundation ț

Agenda ● New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 ● Optimizer related features – Histograms – Non-recursive CTEs ● Derived table optimizations – Window Functions ● Let’s look and compare – Also look at PostgreSQL and SQL Server 2

Histograms

Condition Selectivity Query optimizer needs to decide on a plan to execute the query Goal is to get the shortest running time • Chose access method - Index Access, Hash Join, BKA, etc. • Choose correct join order to minimize the cost of reading rows - Usually, minimizing rows read minimizes execution time - Sometimes reading more rows is advantageous, if table / index is all in memory 4

Condition Selectivity Query optimizer needs to decide on a plan to execute the query Goal is to get the shortest running time • Chose access method - Index Access, Hash Join, BKA, etc. • Choose correct join order to minimize the cost of reading rows - Usually, minimizing rows read minimizes execution time - Sometimes reading more rows is advantageous, if table / index is all in memory Use a cost model to estimate how long an execution plan would take 5

Condition Selectivity Query optimizer needs to decide on a plan to execute the query Goal is to get the shortest running time • Chose access method - Index Access, Hash Join, BKA, etc. • Choose correct join order to minimize the cost of reading rows - Usually, minimizing rows read minimizes execution time - Sometimes reading more rows is advantageous, if table / index is all in memory Use a cost model to estimate how long an execution plan would take For each condition in the where clause (and having) we compute • Condition selectivity - How many rows of the table is this condition going to accept? 10%, 20%, 90% ? 6

Condition Selectivity Query optimizer needs to decide on a plan to execute the query Goal is to get the shortest running time • Chose access method - Index Access, Hash Join, BKA, etc. • Choose correct join order to minimize the cost of reading rows - Usually, minimizing rows read minimizes execution time - Sometimes reading more rows is advantageous, if table / index is all in memory Use a cost model to estimate how long an execution plan would take For each condition in the where clause (and having) we compute • Condition selectivity - How many rows of the table is this condition going to accept? 10%, 20%, 90% ? Getting the estimates 7 right is important!

Condition Selectivity Suppose we have query with 10 tables: T1, T2, T3, … T10 8

Condition Selectivity Suppose we have query with 10 tables: T1, T2, T3, … T10 Query optimizer will: • Estimate the number of rows that it will read from each table • Based on the conditions in the where (and having) clause 9

Condition Selectivity Suppose we have query with 10 tables: T1, T2, T3, … T10 Query optimizer will: • Estimate the number of rows that it will read from each table • Based on the conditions in the where (and having) clauses Assume estimates have an average error coefficient e • Total number of estimated rows read is: - (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10) • Where #T1..#T10 is the actual number of rows read for each table 10

Condition Selectivity Suppose we have query with 10 tables: T1, T2, T3, … T10 Query optimizer will: • Estimate the number of rows that it will read from each table • Based on the conditions in the where (and having) clauses Assume estimates have an average error coefficient e • Total number of estimated rows read is: - (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10) • Where #T1..#T10 is the actual number of rows read for each table The estimation error is amplified, the more tables there are in a join • If we under/over estimate by a factor of 2 final error factor is 1024 ! • If error is only 1.5 (off by 50%), final error factor is ~60 11

Condition Selectivity How does optimizer produce estimates? • Condition analysis: - Is it possible to satisfy conditions? t1.a > 10 and t1.a < 5 - Equality condition on a distinct column? • Index dives to get number of rows in a range • Guesstimates (MySQL) • Histograms for non-indexed columns 12

Histograms Histograms estimate a distribution 13

Histograms Histograms estimate a distribution Multiple types of histograms • Equi-Width Histograms 14

Histograms Histograms estimate a distribution Multiple types of histograms • Equi-Width Histograms - Not uniform information - Many values in one bucket (5) - Other buckets take few values (1) 15

Histograms Histograms estimate a distribution Multiple types of histograms • Equi-Width Histograms - Not uniform information - Many values in one bucket (5) - Other buckets take few values (1) 16

Histograms Histograms estimate a distribution Multiple types of histograms • Equi-Width Histograms - Not uniform information - Many values in one bucket (5) - Other buckets take few values (1) • Equi-Height Histograms - All bins have same #values - More bins where there are more Values 17

Histograms Histograms estimate a distribution Multiple types of histograms • Equi-Width Histograms - Not uniform information - Many values in one bucket (5) - Other buckets take few values (1) • Equi-Height Histograms - All bins have same #values - More bins where there are more Values • Most Common Values Histograms - Useful for ENUM columns - One bin per value 18

Histograms in MariaDB MariaDB histograms are collected by doing a full table scan • Needs to be done manually using ANALYZE TABLE … PERSISTENT Stored inside • mysql. table_stats , mysql. column_stats , mysql. index_stats • As a binary value (max 255 bytes), single / double precision • Special function to decode, decode_histogram() Can be manually updated • One can run data collection on a slave, then propagate results Not enabled by default, needs a few switches turned on to work 19

Histograms in MySQL MySQL histograms are collected by doing a full table scan • Needs to be done manually using ANALYZE TABLE … UPDATE HISTOGRAM • Can collect all data or perform sampling by skipping rows, based on max memory allocation Stored inside data dictionary • Can be viewed through INFORMATION_SCHEMA. column_statistics • Stored as Equi-Width (Singleton) or Equi-Height • Visible as JSON Can not be manually updated • No obvious easy way to share statistics Enabled by default, will be used when available 20

Histograms in PostgreSQL PostgreSQL histograms are collected by doing a true random read • Can be collected manually with ANALYZE • Also collected automatically when VACUUM runs Stores equal-height and most common values at the same time • Equal-height histogram doesn’t cover MCV Can be manually updated • One could import histograms from slave instances • VACUUM auto-collection seems to cover the use case 21

Using Histograms Histograms are useful for range conditions • Equi-width or equi-height: - COLUMN > constant • Most Common Values (Singleton): - COLUMN = constant Problematic when multiple columns are involved: • t1.COL1 > 100 AND t1.COL2 > 1000 Most optimizers assume column values are independent • P(A ∩ B) = P(A) * P(B) vs P(A ∩ B) = P(A) * P(B | A) PostgreSQL 10 has added support for multi-variable distributions. MySQL assumes independent values. MariaDB doesn’t handle multi-variable case well either. 22

Using Histograms Sample database world : select city.name from city where (city.population > 10 mil or city.population < 10 thousand) MariaDB MySQL PostgreSQL Estimated Rows Filtered 1.95% 1.09% 1.05% Actual Rows Filtered 1.05 % 23

Using Histograms Table with 2 columns A and B • t1.a always equals t1.b • 10 distinct values, each value occurs with 10% probability select t1.A, t1.B from t1 where t1.A = t1.B and t1.A = 5 MariaDB MySQL PostgreSQL Estimated Rows Filtered 1.03% 1% 10% Actual Rows Filtered 10% 24

Conclusions MariaDB • Slightly less precise than MySQL, but smaller in size • Same problem with correlated data as MySQL • Performs full-table-scan, no sampling support • Easy to share between instances MySQL • Histograms provide good estimates for real world data • Poor performance with highly correlated data • Performs full-table-scan, supports sampling PostgreSQL • Estimates on par with MySQL and MariaDB • Support for multi-variable distributions! 25 • True sampling

Optimizations for derived tables and non-recursive CTEs

A set of related optimizations Some are new, some are old: ● Derived table merge ● Condition pushdown – Condition pushdown through window functions ● GROUP BY splitting 27

Background – derived table merge ● “VIP customers and their big orders from October” select * from vip_customer, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customer.customer_id 28

MariaDB Optimizer in 10.3, where does it stand? Santa Clara, - PowerPoint PPT Presentation

MariaDB Optimizer in 10.3, where does it stand? Santa Clara, California | April 23th 25th, 2018 Santa Clara, California | April 23th 25th, 2018 Sergey Petrunia MariaDB Project Sergey Petrunia MariaDB Project Vicen

MariaDB State of MariaDB Michael Monty Widenius MariaDB hacker monty@askmonty.org

Histograms in MariaDB, MySQL and PostgreSQL Sergei Petrunia, MariaDB Sergei Petrunia, MariaDB

FOSDEM MariaDB 10 - The Spider Storage Engine (a sharding plugin for MySQL/MariaDB) Stphane

The MySQL Query Optimizer Explained Through Optimizer Trace ystein Grvlen Senior Staff

9/3/2020 Pretest What does NRS stand for? What does EFL stand for? What does PoP stand

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the "brain" of

MariaDB security features and best practices Robert Bindar Software Developer @MariaDB

ALTER TABLE IMPROVEMENTS IN MARIA DB SERVER Marko Mkel Lead Developer InnoDB MariaDB

The MariaDB/MySQL Query Executor In-depth Presented by: Timour Katchaounov Optimizer team: Igor

How a database optimizer gets your data, fast Viceniu Ciorbaru Software Developer Team Lead

BOOSTING PERFORMANCE OF ORDER BY LIMIT QUERIES Varun Gupta Optimizer Developer MariaDB

Understanding and control of MySQL Query Optimizer traditional and novel tools and techniques

using Amazon Key Management Service Jan Lindstrm, Principal Engineer, MariaDB Corporation

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

The Volcano Optimizer Generator: Extensibility and Efficient Search Presentation: Mirna Limic

Diabetic amputations When is Primary Amputation Better for the Patient UCSF Vascular Symposium

Far more than Petaflops: The Jlich Supercomputing Centre ScicomP 15 & SP-XXL Thomas

Logic-based modeling and analysis of timed systems Matteo Rossi Dipartimento di Elettronica ed

Simulating Timed UML2 Sequence Diagrams with Timed CSP M Roggenbach (Swansea, Wales, UK)

Completeness for Concurrent Kleene Algebra e 1 Paul Brunet 1 Alexandra Silva 1 Fabio Zanasi 1

Concurrent Kleene Algebra: Free Model and Completeness e 1 Paul Brunet 1 Alexandra Silva 1 Fabio

Forensic and I nvestigative Speaker Recognition: Situation BKA Odyssey 2016 June 21-24, Bilbao,

Hash-based Signatures Andreas Hlsing Eindhoven University of Technology Executive School on

MariaDB Optimizer in 10.3, where does it stand? Santa Clara, - PowerPoint PPT Presentation

MariaDB Optimizer in 10.3, where does it stand? Santa Clara, California | April 23th 25th, 2018 Santa Clara, California | April 23th 25th, 2018 Sergey Petrunia MariaDB Project Sergey Petrunia MariaDB Project Vicen

MariaDB State of MariaDB Michael Monty Widenius MariaDB hacker monty@askmonty.org

Histograms in MariaDB, MySQL and PostgreSQL Sergei Petrunia, MariaDB Sergei Petrunia, MariaDB

FOSDEM MariaDB 10 - The Spider Storage Engine (a sharding plugin for MySQL/MariaDB) Stphane

The MySQL Query Optimizer Explained Through Optimizer Trace ystein Grvlen Senior Staff

9/3/2020 Pretest What does NRS stand for? What does EFL stand for? What does PoP stand

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the &quot;brain&quot; of

MariaDB security features and best practices Robert Bindar Software Developer @MariaDB

ALTER TABLE IMPROVEMENTS IN MARIA DB SERVER Marko Mkel Lead Developer InnoDB MariaDB

The MariaDB/MySQL Query Executor In-depth Presented by: Timour Katchaounov Optimizer team: Igor

How a database optimizer gets your data, fast Viceniu Ciorbaru Software Developer Team Lead

BOOSTING PERFORMANCE OF ORDER BY LIMIT QUERIES Varun Gupta Optimizer Developer MariaDB

Understanding and control of MySQL Query Optimizer traditional and novel tools and techniques

using Amazon Key Management Service Jan Lindstrm, Principal Engineer, MariaDB Corporation

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

The Volcano Optimizer Generator: Extensibility and Efficient Search Presentation: Mirna Limic

Diabetic amputations When is Primary Amputation Better for the Patient UCSF Vascular Symposium

Far more than Petaflops: The Jlich Supercomputing Centre ScicomP 15 &amp; SP-XXL Thomas

Logic-based modeling and analysis of timed systems Matteo Rossi Dipartimento di Elettronica ed

Simulating Timed UML2 Sequence Diagrams with Timed CSP M Roggenbach (Swansea, Wales, UK)

Completeness for Concurrent Kleene Algebra e 1 Paul Brunet 1 Alexandra Silva 1 Fabio Zanasi 1

Concurrent Kleene Algebra: Free Model and Completeness e 1 Paul Brunet 1 Alexandra Silva 1 Fabio

Forensic and I nvestigative Speaker Recognition: Situation BKA Odyssey 2016 June 21-24, Bilbao,

Hash-based Signatures Andreas Hlsing Eindhoven University of Technology Executive School on

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the "brain" of

Far more than Petaflops: The Jlich Supercomputing Centre ScicomP 15 & SP-XXL Thomas