billion goods in few categories how histograms save a life
play

Billion Goods in Few Categories: how Histograms Save a Life? - PowerPoint PPT Presentation

Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova Table of Contents The Case The Cardinality: Two Levels ANALYZE TABLE Limitations Solutions in Percona Server 5.7 Histograms


  1. Example • Number of pages in the table: 20,000 • STATS SAMPLE PAGES : 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 20 = 10,000 24

  2. Example 2 • Number of pages in the table: 20,000 • STATS SAMPLE PAGES : 5,000 • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 5,000 = 40 25

  3. Use Larger STATS SAMPLE PAGES? • Time consuming mysql> select count(*) from goods; +----------+ | count(*) | +----------+ | 80303000 | +----------+ 1 row in set (35.95 sec) 26

  4. Use Larger STATS SAMPLE PAGES? • Time consuming • With default STATS SAMPLE PAGES mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.32 sec) 26

  5. Use Larger STATS SAMPLE PAGES? • Time consuming • With bigger number mysql> alter table goods STATS_SAMPLE_PAGES=5000; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (27.13 sec) 26

  6. Use Larger STATS SAMPLE PAGES? • Time consuming • With bigger number • 27.13/0.32 = 85 times slower! 26

  7. Does ANALYZE TABLE Block Reads? User Manual claims it does not During the analysis, the table is locked with a read lock for InnoDB and MyISAM. 27

  8. Does ANALYZE TABLE Block Reads? User Manual claims it does not • But! 27

  9. Does ANALYZE TABLE Block Reads? User Manual claims it does not Sometimes it blocks all subsequent queries +------+-------------------------+---------------------------------+ | Time | State | Info | +------+-------------------------+---------------------------------+ | 32 | Writing to net | select * from t where c > ’%0%’ | | 12 | Waiting for table flush | select * from test.t where i=1 | | 12 | Waiting for table flush | select * from test.t where i=2 | | 12 | Waiting for table flush | select * from test.t where i=3 | | 11 | Waiting for table flush | select * from test.t where i=7 | | 10 | Waiting for table flush | select * from test.t where i=11 | ... 27

  10. Simply Increasing STATS SAMPLE PAGES Is not a solution 28

  11. Solutions in Percona Server 5.7

  12. Blocking ANALYZE TABLE Considered as a bug • jira.percona.com/browse/PS-2503 • lp:1704195 • bugs.mysql.com/87065 30

  13. Blocking ANALYZE TABLE Considered as a bug • Fixed in Percona Server 5.6.38-83.0/5.7.20-18 30

  14. Non-Blocking ANALYZE TABLE • Before the fix • Opens table statistics Concurrent DML allowed • Updates table statistics Concurrent DML allowed • Update finished • Invalidates entry in table definition cache Concurrent DML forbidden • Invalidates query cache Concurrent DML forbidden 31

  15. Non-Blocking ANALYZE TABLE • After the fix • Opens table statistics Concurrent DML allowed • Updates table statistics Concurrent DML allowed • Update finished • Invalidates entry in table definition cache Concurrent DML forbidden • Invalidates query cache Concurrent DML forbidden 31

  16. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats 32

  17. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable 32

  18. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable • Updating it with following FLUSH TABLE allows to fake any statistics 32

  19. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable • Updating it with following FLUSH TABLE allows to fake any statistics • Hack • Not documented • Not recommended • Can stop working any time 32

  20. 5.7: Resume • With Percona fix for blocking ANALYZE TABLE we can use large value for STATS SAMPLE PAGES • Does not help when • Index cannot be used • Data distribution in the index vary a lot 33

  21. 5.7: Resume • With Percona fix for blocking ANALYZE TABLE we can use large value for STATS SAMPLE PAGES • Does not help when • Index cannot be used • Data distribution in the index vary a lot • Manual update allows to fix statistics • Not recommended • Can stop working any time 33

  22. Histograms

  23. What are the Histograms? • Optimizer Column Statistics • Engine-independent • No fancy calculations • Knows about data distribution 35

  24. Number of Values in Each Bucket 800 600 400 200 0 36 1 2 3 4 5 6 7 8 9 10

  25. Data in the Histogram 1 0 . 8 0 . 6 0 . 4 0 . 2 0 37 1 2 3 4 5 6 7 8 9 10

  26. How Histograms are Helpful? • Accurate statistics • Truly persistent • No extra calculations on access • Optimizer knows about data distribution • Without touching the table! 38

  27. Filtered Rows • Example data mysql> create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec) 39

  28. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  29. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  30. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  31. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  32. Filtered Rows • With the histogram mysql> analyze table example update histogram on f1 with 3 buckets; +-----------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------+-----------+----------+------------------------------+ | hist_ex.example | histogram | status | Histogram statistics created for column ’f1’. | +-----------------+-----------+----------+------------------------------+ 1 row in set (0.03 sec) 39

  33. Filtered Rows • With the histogram mysql> select * from information_schema.column_statistics -> where table_name=’example’\G *************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec) 39

  34. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows 39 Extra: Using where

  35. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows 39 Extra: Using where

  36. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row 39 Extra: Using where

  37. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row 39 Extra: Using where

  38. Example • EXPLAIN without histograms mysql> explain select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range -> order by goods.cat_id -> limit 10\G -- We ask for 10 rows only! 40

  39. Example • EXPLAIN without histograms *************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; 40 Using temporary; Using filesort

  40. Example • EXPLAIN without histograms *************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 40 2 rows in set, 1 warning (0.01 sec)

  41. Example • Execution time without histograms mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; ab9f9bb7bc4f357712ec34f067eda364 - 10 rows in set (56.47 sec) 40

  42. Example • Engine statistics without histograms mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec) 40

  43. Example • Now lets add the histogram mysql> analyze table goods update histogram on date_added; +------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +------------+-----------+----------+------------------------------+ | orig.goods | histogram | status | Histogram statistics created for column ’date_added’. | +------------+-----------+----------+------------------------------+ 1 row in set (2.01 sec) 40

  44. Example • EXPLAIN with the histogram mysql> explain select goods.* from goods -> join categories -> on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10\G 40

  45. Example • EXPLAIN with the histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where 40

  46. Example • EXPLAIN with the histogram *************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 40 2 rows in set, 1 warning (0.01 sec)

Recommend


More recommend