PostgreSQL query planner’s internals How I Learned to Stop Worrying and Love the Planner Alexey Ermakov alexey.ermakov@dataegret.com
2 Why this talk? • Why this query is so slow? • Why planner is not using my index? • What to do? dataegret.com
3 Where are we going? • How planner works • How we can affect it’s work • When it can go wrong • Known limitations dataegret.com
4 The Path of a Query Connection ↓ Parser ↓ Rewrite system ↓ Planner/Optimizer ↓ Executor ↔ [Workers] ↓ Send results all in single process (backend) beside background workers (parallel seq scan, 9.6+) dataegret.com
5 EXPLAIN command explain (ANALYZE,VERBOSE,COSTS,BUFFERS,TIMING 1 ) select * from t1; QUERY PLAN ------------------------------------------------------------------- Seq Scan on public.t1 (cost=0.00..104424.80 rows=10000000 width=8) (actual time=0.218..2316.688 rows=10000000 loops=1) Output: f1, f2 Buffers: shared read=44248 I/O Timings: read=322.714 2 Planning time: 0.024 ms Execution time: 3852.588 ms 1 COSTS and TIMING options are on by default 2 I/O Timings shown when track_io_timing is enabled dataegret.com
6 Planner have to guess Seq Scan on public.t1 (cost=0.00..104424.80 rows=10000000 width=8) • startup cost • total cost • rows • average row width dataegret.com
7 Cost stability principles Quote from “Common issues with planner statistics by Tomas Vondra ” : 3 • correlation to query duration : The estimated cost is correlated with duration of the query, i.e. higher cost means longer execution. • estimation stability : A small di ff erence in estimation causes only small di ff erence in costs, i.e. small error in estimation causes only small cost di ff erences. • cost stability : Small cost di ff erence means small di ff erence in duration. • cost comparability : For a given query, two plans with (almost) the same costs should result in (almost) the same duration. 3 https://blog.pgaddict.com/posts/common-issues-with-planner-statistics dataegret.com
8 Data retrieval methods • seq scan – sequential scan of whole table • index scan – random io (read index + read table) • index only scan – read only index (9.2+) 4 • bitmap index scan – something in between seq scan/index scan, possible to use several indexes at same time in OR/AND conditions 4 https://wiki.postgresql.org/wiki/Index-only_scans dataegret.com
9 Join methods • nested loop – optimal for small relations • hash join – optimal for big relations • merge join – optimal for big relations if they ’ re sorted dataegret.com
10 Aggregate methods • aggregate • hash aggregate • group aggregate dataegret.com
11 Planner Cost Constants #seq_page_cost = 1.0 # cost of a sequentially-fetched disk page #random_page_cost = 4.0 # cost of a non-sequentially-fetched disk page #cpu_tuple_cost = 0.01 # cost of processing each row during a query #cpu_index_tuple_cost = 0.005 # cost of processing each index entry #cpu_operator_cost = 0.0025 # cost of processing each operator or function so basically cost is just � c i n i . How hard could it be? i dataegret.com
12 Well, kind of hard • How many rows we ’ ll get when we ’ ll fi lter table by this condition? • How many pages is that? Will we read them sequentially or not? • How many rows we ’ ll get when we join 2 relations? dataegret.com
13 We have stats! • pg_statistic • pg_stats view – the same but human-readable dataegret.com
14 pg_stats pgday=# \d pg_stats Column | Type | ------------------------+----------+---------------------------------------------------------------- tablename | name | name of the table or functional index attname | name | name of the column or index column null_frac | real | fraction of column entries that are null avg_width | integer | average width in bytes of column’s entries n_distinct | real | number (or fraction of number of rows) of distinct values most_common_vals | anyarray | list of the most common values in the column most_common_freqs | real[] | list of the frequencies of the most common values histogram_bounds | anyarray | list of intervals with approximately equal population correlation | real | correlation between physical row ordering and logical ordering most_common_elems | anyarray | most_common_elem_freqs | real[] | elem_count_histogram | real[] | dataegret.com
15 Analyze pick 300*stats_target random pages pick 300*stats_target random rows 5 table pages (8Kb) rows 5 Algorithm Z from Vitter, Je ff rey S. (1 March 1985). “ Random sampling with a reservoir ” dataegret.com
16 Analyze null_frac most_common_vals {1,3,6} MCV list most_common_freqs {0.24,0.24,0.24} nulls column values not nulls sort � 1 n_distinct histogram_bounds stats _ target p cutoff = MIN -0.2 1 . 25 p avg {2,5,8,10} dataegret.com
17 autoanalyze • inserted + updated + deleted > threshold ⇒ run autoanalyze • threshold = autovacuum_analyze_threshold + reltuples*autovacuum_analyze_scale_factor • autovacuum_analyze_scale_factor (default = 0.1) • autovacuum_analyze_threshold (default = 50) • default_statistics_target (default = 100) • rows in sample = 300 * stats_target dataegret.com
18 n_distinct underestimation example select setseed(0.5); create table test_ndistinct as select (case when random() < 0.1 then f1 end)::int f1 from normal_rand(10000000, 50000, 50000/3) as nr(f1); 10M rows, 90% nulls, ≈ 99.7% of values in between 0..100000 dataegret.com
19 n_distinct underestimation example # analyze verbose test_ndistinct; INFO: analyzing "public.test_ndistinct" INFO: "test_ndistinct": scanned 30000 of 35314 pages, containing 8495268 live rows and 0 dead rows; 30000 rows in sample, 10000067 estimated total rows select * from pg_stats where tablename = ’test_ndistinct’ and attname = ’f1’; ... null_frac | 0.904067 avg_width | 4 n_distinct | 3080 most_common_vals | most_common_freqs | histogram_bounds | {-8505,10072,15513,18933,21260,22574,24082,25695,26953,27898,28645... correlation | -0.00286606 dataegret.com
20 n_distinct underestimation example # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN --------------------------------------------------------------------------------------- HashAggregate (cost=160314.84..160345.64 rows=3080 width=4) (actual time=2558.751..2581.286 rows=90020 loops=1) Group Key: f1 -> Seq Scan on test_ndistinct (cost=0.00..135314.67 rows=10000067 width=4) (actual time=0.045..931.687 rows=10000000 loops=1) Planning time: 0.048 ms Execution time: 2586.550 ms dataegret.com
21 n_distinct underestimation example # set default_statistics_target = 50; # analyze verbose test_ndistinct; INFO: analyzing "public.test_ndistinct" INFO: "test_ndistinct": scanned 15000 of 35314 pages, containing 4247361 live rows and 0 dead rows; 15000 rows in sample, 9999792 estimated total rows # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN --------------------------------------------------------------------------------------- HashAggregate (cost=160311.40..160328.51 rows=1711 width=4) (actual time=2436.392..2455.851 rows=90020 loops=1) Group Key: f1 -> Seq Scan on test_ndistinct (cost=0.00..135311.92 rows=9999792 width=4) (actual time=0.029..892.596 rows=10000000 loops=1) Planning time: 0.096 ms Execution time: 2461.160 ms dataegret.com
22 n_distinct underestimation example # explain analyze select * from test_ndistinct where f1 < 5000; QUERY PLAN --------------------------------------------------------------------------------------- Seq Scan on test_ndistinct (cost=0.00..160316.36 rows=99 width=4) (actual time=2.325..1436.792 rows=3480 loops=1) Filter: (f1 < 5000) Rows Removed by Filter: 9996520 Planning time: 0.058 ms Execution time: 1437.424 ms dataegret.com
23 n_distinct underestimation example alter table test_ndistinct alter column f1 set (n_distinct = 100000); analyze verbose test_ndistinct; INFO: analyzing "public.test_ndistinct" INFO: "test_ndistinct": scanned 15000 of 35314 pages, containing 4247670 live rows and 0 dead rows; 15000 rows in sample, 10000012 estimated total rows ANALYZE dataegret.com
24 n_distinct underestimation example # explain analyze select distinct f1 from test_ndistinct ; QUERY PLAN ------------------------------------------------------------------------------------------- Unique (cost=1571431.43..1621431.49 rows=100000 width=4) (actual time=4791.872..7551.150 rows=90020 loops=1) -> Sort (cost=1571431.43..1596431.46 rows=10000012 width=4) (actual time=4791.870..6893.413 rows=10000000 loops=1) Sort Key: f1 Sort Method: external merge Disk: 101648kB -> Seq Scan on test_ndistinct (cost=0.00..135314.12 rows=10000012 width=4) (actual time=0.041..938.093 rows=10000000 loops=1) Planning time: 0.099 ms Execution time: 7714.701 ms dataegret.com
Recommend
More recommend