reducing risk when upgrading your mysql environment
play

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp - PowerPoint PPT Presentation

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My Experience as MySQL Consultant On Upgrading MySQL it's quite complex... Kenny Gryp MySQL Practice Manager Table of Contents The Ocial Documentation


  1. Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager

  2. My Experience as MySQL Consultant On Upgrading MySQL it's quite complex... Kenny Gryp MySQL Practice Manager

  3. Table of Contents The O�cial Documentation Make Your Own Documentation Potential Risks Establish Upgrade Method For A Single Server Rollback Scenario Testing Test Writes Test Individual Reads Workload Testing Establish (& Test) Migration Process Migration In Production (Rollback) Post-Migration Assessment 3 / 77

  4. The O�cial Documentation 4 / 77

  5. Oracle's Recommended Process Backup your data Read all release notes and assess https://dev.mysql.com/doc/relnotes/mysql/5.7/en/ Read Changes Affecting Upgrades to MySQL 5.7 https://dev.mysql.com/doc/refman/5.7/en/upgrading-from- previous-series.html 5 / 77

  6. Oracle's Recommended Process Upgrade Slaves First 6 / 77

  7. Oracle's Recommended Process Upgrade Slaves First In-Place Upgrade: Clean shutdown ( innodb_fast_shutdown=0 ) Run mysql_upgrade Logical Upgrade: mysqldump data Import data again Run mysql_upgrade to �x mysql schema http://dev.mysql.com/doc/refman/5.7/en/upgrading.html 7 / 77

  8. Oracle's Recommended Process (cont.) A Lot of Risk: No guarantee queries will execute the same No guarantee queries will be same speed or faster No guarantee all your queries will still work (new default stricter sql_mode ) There is no o�cial support to upgrade from <5.6 to 5.7 but we might actually be able to do that 8 / 77

  9. do-it-yourself Documenting The Process 9 / 77

  10. Documenting The Process PEBKAC: Human errors happen and create issues import data using wrong character set setting up replica using wrong binlog �le/pos ... Document every step , we need to repeat it multiple times 10 / 77

  11. making you afraid to upgrade by describing Potential Risks 11 / 77

  12. Optimizer Changes Example: index_merge_intersection Often seen during migrations to MySQL 5.6 Affects environments with sub-optimal indexing Queries with c1='a' AND c2='b' when composite index (c1,c2) is missing Is often slower when selectivity with 1 of the 2 columns is bad (and it happens frequently) Result: a lot of queries were slower in new environment Need SELECT performance tests between versions https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/ 12 / 77

  13. New Defaults In MySQL 5.7 The new defaults in MySQL 5.7 make a lot of sense: More use of available features and performance enhancements out of the box More strictness with data/query validation New Reserved words Applications might not be ready for it. Drupal 7 - https://www.drupal.org/node/2545480 They will/might break the application more easily: sql_mode=ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION innodb_strict_mode=1 Needs SELECT & DML query validity tests between versions 13 / 77

  14. Other Changes in MySQL 5.7 Passwords that use the older pre-4.1 password hashing format is removed. 14 / 77

  15. Minor Versions Also At Risk CREATE TABLE date (d DATE); INSERT INTO date VALUES ('2017-04-19'); SELECT COUNT(*) FROM date WHERE d < NOW()-INTERVAL 1 DAY; MySQL 5.0.37 MySQL 5.0.45 +-------+ +-------+ | 0 | | 1 | +-------+ +-------+ Seen with DELETE FROM date WHERE d < NOW()- INTERVAL 1 DAY in binlog_format=STATEMENT environments. Needs SELECT & DML query result tests between versions 15 / 77

  16. Workload SYNC_BINLOG=1 in MySQL 5.7 Can impact certain environments, might not be noticed when looking at a single query InnoDB LRU Flushing changes require tuning for heavy workloads in 5.6 ( innodb_lru_scan_depth ) When switching to MySQL 8.0 with the new data dictionary ... Need to do Workload Testing between versions http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.html http://lefred.be/content/sync_binlog-1-in-5-7/ 16 / 77

  17. How Do We Reduce All This Risk? 17 / 77

  18. Testing! 18 / 77

  19. establish Upgrade Method For A Single Server 19 / 77

  20. Upgrade Method For A Single Server Follow MySQL documentation: http://dev.mysql.com/doc/refman/5.7/en/upgrading.html Ensure to document every command Restore from backup Or take a replica you can miss 20 / 77

  21. Upgrade Method For A Single Server 21 / 77

  22. Upgrade Method For A Single Server 22 / 77

  23. Replication Consistency Testing Writes 23 / 77

  24. Writes - Replication Consistency pt-table-checksum : validate consistency in a replication topology Identify problems caused by PEBKAC Ensure events replicate properly ( binlog_format=STATEMENT ) Upgrade a replica or add a replica which is using the modi�ed version. Do it on production, will have no result in test/staging https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html 24 / 77

  25. Writes - Replicate Test Server 25 / 77

  26. often left behind is Rollback Scenario Testing 26 / 77

  27. Rollback Scenario Testing Possibility to fall back in case something went wrong during migration Can be done using replication, but has to be tested! 27 / 77

  28. Writes - Rollback Testing 28 / 77

  29. Rollback Scenario Testing You might need to change some settings to your new my.cnf to be able to support replicating back. Example: binlog_checksum = NONE binlog_row_image = FULL binlog_rows_query_log_events = OFF log_bin_use_v1_row_events = 1 gtid_mode = OFF log_slave_updates=1 skip-slave-start 29 / 77

  30. Writes - Checksums - GTID 30 / 77

  31. Writes - Checksums - Non-GTID 31 / 77

  32. Writes - Checkums - ROW 32 / 77

  33. Writes - Checkums - ROW 33 / 77

  34. Where To Run pt-table-checksum ? GTID: pt-table-checksum can only be run on Master (Errant Transactions) Or scratch the pt-table-checksum host after tests non-GTID: pt-table-checksum can be run on intermediate master binlog_format=ROW : only 1 tier below can be checksummed run on every tier that has a replica (for rollback) pt-table-checksum can bring prod overhead when run on active master Let replication run for a while before checksumming 34 / 77

  35. pt-table-checksum results On every replica (including rollback): SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksum WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +----+-----------------+------------+--------+ | db | tbl | total_rows | chunks | +----+-----------------+------------+--------+ | db | telephone_debit | 44342 | 1 | | db | orderline | 21451 | 3 | | db | orders | 25125215 | 12 | +----+-----------------+------------+--------+ 35 / 77

  36. pt-table-checksum - Analysis Troubleshooting starts now... What went wrong? 36 / 77

  37. pt-table-checksum - Analysis Which chunks failed? db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARY lower_boundary: 5014733 upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48 37 / 77

  38. pt-table-checksum - Analysis Which chunks failed? db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARY lower_boundary: 5014733 upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48 38 / 77

  39. pt-table-checksum - Analysis SELECT * INTO outfile '/tmp/telephone_debit_mysql56' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; SELECT * INTO outfile '/tmp/telephone_debit_mysql57' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; # diff -u /tmp/telephone_debit_mysql5{6,7} 39 / 77

  40. pt-table-checksum - Analysis SELECT * INTO outfile '/tmp/telephone_debit_mysql56' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; SELECT * INTO outfile '/tmp/telephone_debit_mysql57' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; # diff -u /tmp/telephone_debit_mysql5{6,7} Use twindb_table_compare ! https://github.com/twindb/twindb_table_compare 40 / 77

  41. pt-table-checksum - Analysis Wrong upgrade method backups wrong replication �le/pos ... binlog_format=STATEMENT using ( UUID() ...) Common Seen Issues replicating older versions: Floating point differences: Storing currencies in a DOUBLE Temporal data types Invalid dates converted to zero dates Trailing spaces in CHAR �elds 41 / 77

  42. Testing Writes Consistency Checks Process: Checksum Check for differences On new environment On rollback environment For each inconsistency Analyze diff Find root cause Fix problem Document problem & solution Repeat checksum again 42 / 77

  43. Testing Individual Reads 43 / 77

  44. Testing Reads - Collect Queries 44 / 77

  45. Testing Reads - Collect Queries Collection Techniques: Slow Query Log long_query_time=0 Careful when ~+10000 QPS Percona Server: log_slow_rate_limit tcpdump 'packets lost' in libpcap Application/Load Balancer queries Ensure: Get the full workload (long enough) Get data from Master & Replicas Collect batchjob queries running at night https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html 45 / 77

  46. Testing Reads - Setup 2 Environments 46 / 77

Recommend


More recommend