Demystifying MySQL Replication Crash Safety Presented at Percona - PowerPoint PPT Presentation

Demystifying MySQL Replication Crash Safety Presented at Percona Live Europe 2018 in Frankfurt by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com

Introducing MessageBird 225+ Direct-to-Carrier Agreements With operators from around the world MessageBird is a cloud communications platform founded in Amsterdam 2011. 15,000+ Customers Examples of our messaging and voice SaaS: In over 60+ countries SMS in and out, call in (IVR) and out (alert), SIP, WhatsApp, Facebook, Telegram, Twitter, WeChat, … 180+ Employees Omni-Channel Conversation Engineering office in Amsterdam Sales and support offices worldwide Details at www.messagebird.com We are expanding : {Software, Front-End, Infrastructure, Data, Security, Telecom, QA} Engineers {Team, Tech, Product} Leads, Product Owners, Customer Support {Commercial, Connectivity, Partnership} Managers 2 www.messagebird.com/careers

Summary (Demystifying MySQL Replication Crash Safety – PLEU2018) • Helicopter view of – and then Zoom in – Replication and Crash Safety • MySQL 5.6 solution (and its problems) • Complexifying things with GTIDs and Multi-Threaded Slave ( MTS ) • Impacts of reducing / compromising durability ( sync_binlog != 1 and trx_commit != 1 ) • Overview of related subjects: Semi-Sync, MariaDB & Pseudo-GTIDs • Closing, links, bugs and questions 3

Overview of MySQL Replication (Demystifying MySQL Replication Crash Safety – PLEU2018) One master with one or more slaves: • The master records transactions in a journal (binary logs); each slave: • Downloads the journal and saves it locally in the relay logs (IO thread) • Executes the relay logs on its local database (SQL thread) • Could also produce binary logs to be a master ( log-slave-updates – lsu )

Replication Crash Safety (Demystifying MySQL Replication Crash Safety – PLEU2018) What do I mean by Replication Crash Safety ? • When a slave crashes, it is able to resume replication after recovery (OK if rewinds its state after recovery, as long as it is eventually consistent) • When a master crashes, slaves are able to resume replicating from it • All above without sacrificing data consistency • In other words: A C I D is not compromised by a slave or a master crash (Discussion limited to transactional SE: InnoDB, TokuDB, MyRocks; obviously not MyISAM) Intermediate masters ( IM ) qualify both as master and slave Slaves are potential master (and IM) in some failover strategy (Proving replication crash un-safety is easy, proving safety is hard) 5

State of the Dolphin and of the Sea Lion (Demystifying MySQL Replication Crash Safety – PLEU2018) State of the Dolphin in Replication Crash Safety: • MySQL 5.5 is not crash safe • MySQL 5.6 can be made crash safe (it is not by default) • MySQL 5.7 is mostly the same as 5.6 (with complexity added by Logical Lock parallel replication) • MySQL 8.0 is crash safe by default (but it can be made unsafe by “ tuning ” the configuration) Quick state of the Sea Lion: • MariaDB 5.5 is not replication crash safe • MariaDB 10.x can be made crash safe 6

Zoom in the details [1 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) More details about replication: • The IO Thread stores its state in master info ( also configuration stored there ) • The SQL Thread in relay log info slave1 [localhost] {msandbox} ((none)) > show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event [...] Master_Log_File: mysql-bin.000001 <-------+-- master info ( persisted state ) Read_Master_Log_Pos: 25489 <-------+ Relay_Log_File: mysql-relay.000002 <--+ Relay_Log_Pos: 10788 <--+ Relay_Master_Log_File: mysql-bin.000001 <--+-- relay log info ( persisted state ) [...] | Exec_Master_Log_Pos: 10575 <--+ [...] 7 1 row in set (0.00 sec)

Zoom in the details [2 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) More parameters: sync_master_info, sync_relay_log and sync_relay_log_info In MySQL 5.5, master info and relay log info are files: • No atomicity of “making progress” and “state tracking” for IO & SQL Threads • Consistency of actual vs registered state is compromised after a crash Ø This is why replication is not crash-safe in MySQL 5.5 8

Zoom in the details [3 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) Even more parameters: • sync_binlog ( and innodb_flush_log_at_trx_commit – trx_commit ): • Binlogs are synchronised to disk after every N writes/transactions (default 0 in My|SQL 5.5 and 5.6; and in 5.7 and 8.0 it is 1 which is full A C I D ) • trx_commit = 1: logs written and flushed each trx ( full A C I D and default ) = 0: written and flushed once per second (not crash safe) = 2: written after each trx and flushed once per second (mysqld crash safe, not OS crash safe) 9

MySQL 5.6 solution [1 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) Reminder: problems making MySQL 5.5 Replication Crash Un-Safe: • The position of the SQL Thread cannot be trusted • The position of the IO Thread cannot be trusted • The content of the Relay Logs cannot be trusted 10

MySQL 5.6 solution [2 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) The MySQL 5.6 solution • Atomicity for SQL Thread: relay - log - info - repository = TABLE ( default = FILE ) • Useless for crash safety: a parameter to store master info in a table: • master-info-repository = TABLE (default = FILE ) • Providing a way to “fix” the relay logs: relay - log - recovery = 1 ( default = 0 )

MySQL 5.6 solution [3 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) More details about Relay Log Recovery : • relay-log-recovery is only used on mysqld startup ( dynamic would be useless ) • If relay-log-recovery = 0 , nothing special done ( and a new relay log is created ) • If relay-log-recovery = 1 : • The position of the IO Thread is set to the position of the SQL Thread • The position of the SQL Thread is set to the newly created relay log • If relay-log-purge = 1: the old relay logs will be deleted on SQL Thread startup ( relay-log-recovery does not delete anything: easy to test with skip-slave-start ) Ø Said otherwise, the previous relay logs are skipped ! (those relay logs are considered improper for SQL Thread consumption) • This will happen even if MySQL (or the IO Thread) did not crash OK for 1 st implementation but a waste of perfectly good relay logs 12

MySQL 5.6 solution [4 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) In MySQL 5.7: • No change of defaults ( for replication crash safety ) • Relay log recovery still simplistic K In MySQL 8.0: • Still simplistic relay log recovery L • New defaults: • relay-log-info-repository = TABLE J • relay-log-recovery = 1 J • master-info-repository = TABLE ( not sure this is very useful ) Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave Bug#74321: Execute relay-log-recovery only when needed 13

Adding complexity with GTIDs [1 of 2] (Demystifying MySQL Replication Crash Safety – PLEU2018) Not only MySQL 5.6 introduces replication crash safety, it also introduced Global Transaction IDs ( GTIDs ) • This tags every transaction with an ID when writing to the binlogs • The GTID state of the master and slaves are tracked in the binlogs Ø IO and SQL Thread states are now partially in the binlogs (and relay logs) • Optionally, slaves can use GTID to replicate ( instead of file+position ) • This allows easier repointing of slaves to a new master ( including fail over ) • This heavily relies on precise tracking of GTID states on master and slaves Ø As this tracking is in the binlogs, this is impeded when sync_binlog != 1 Bug#70659: Make crash safe slave with gtid + less durable settings 14

Adding complexity with GTIDs [2 of 2] (Demystifying MySQL Replication Crash Safety – PLEU2018) To make replication crash safe with GTIDs in MySQL 5.6: • relay-log-info-repository = TABLE (default = FILE ) • relay-log-recovery = 1 (default = 0) – (Bug#92093) • sync_binlog = 1 (default = 0) • In 5.7, the default is sync_binlog = 1 J ( two other unchanged K ) • In 8.0, all the defaults are good for crash safe replication with GTID J J • MySQL 5.7 adds a table for storing the GTID state of slaves: • Allows GTIDS slaves without log-slave-updates ( lsu ) • With lsu, this table ( mysql.gtid_executed ) is not updated after each trx Ø Missed opportunity for OS crash safety with sync_binlog != 1 L L L Bug#92109: Make GTID replication crash safe with less durable setting 15

Master Replication Crash Safety [1 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) Relaxing durability of the binlogs implies losing GTID state ( after an OS crash ) • What about the consequence on the master ? With and without GTID ? • If sync_binlog != 1 on the master, an OS crash will lose binlogs • With sync_binlog != 1 , usually trx_commit != 1 ( normally 2, but can be 0 ) • trx_commit = 2 preserves data on mysqld crashes, 0 does not ( à 2 is better ) Ø InnoDB will also lose transactions on an OS crash Ø After an OS crash, InnoDB will be out-of-sync with the binlogs Ø And we cannot trust the binlogs on such master (trx gap or ghost trx) The failure mode will be different depending on the configuration 16

Demystifying MySQL Replication Crash Safety Presented at Percona - PowerPoint PPT Presentation

Demystifying MySQL Replication Crash Safety Presented at Percona Live Europe 2018 in Frankfurt by Jean-Franois Gagn Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com Introducing MessageBird 225+

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

Not Your Grandpas Replication The New Wave of MySQL Replication and How It Helps Your

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

An API for Reading the MySQL Binary Log Lars Thalmann Mats Kindahl Development Director, MySQL

Why MySQL Replication Fails, and How to Get it Back September, 26, 2017 Sveta Smirnova Sveta

Inflectional periphrasis as collocation Olivier Bonami 1 Gert Webelhuth 2 1 Universit Paris

Intrusion Prevention and Detection in Grid computing - The ALICE Case 21st International

Compact Memory and the Frankfurt Digital Judaica Collection Dr. Rachel Heuberger, Judaica

Structural Rewriting in the -Calculus David Sabel Goethe-University, Frankfurt, Germany

Auto in Agda joint work with Pepijn Kokke APLS Frankfurt, December 2015 Per Martin-Lf The

RESTFUL APIS IN GO Frankfurter Entwicklertag 2018 Ralf Wirdemann Navigate : Space / Arrow

Energy Dependence of Multiplicity Fluctuations in Heavy Ion Collisions Benjamin Lungwitz, IKF

1" DataXFormer:"Leveraging"the"Web"

Demystifying MySQL Replication Crash Safety Presented at Percona - PowerPoint PPT Presentation

Demystifying MySQL Replication Crash Safety Presented at Percona Live Europe 2018 in Frankfurt by Jean-Franois Gagn Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com Introducing MessageBird 225+

MySQL Group Replication &amp; MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

MySQL Replication Update MySQL 5.5 (GA) &amp; MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

New features in MySQL Replication Lars Thalmann, Development Manager, Replication &amp; Backup

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

Not Your Grandpas Replication The New Wave of MySQL Replication and How It Helps Your

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

An API for Reading the MySQL Binary Log Lars Thalmann Mats Kindahl Development Director, MySQL

Why MySQL Replication Fails, and How to Get it Back September, 26, 2017 Sveta Smirnova Sveta

Inflectional periphrasis as collocation Olivier Bonami 1 Gert Webelhuth 2 1 Universit Paris

Intrusion Prevention and Detection in Grid computing - The ALICE Case 21st International

Compact Memory and the Frankfurt Digital Judaica Collection Dr. Rachel Heuberger, Judaica

Structural Rewriting in the -Calculus David Sabel Goethe-University, Frankfurt, Germany

Auto in Agda joint work with Pepijn Kokke APLS Frankfurt, December 2015 Per Martin-Lf The

RESTFUL APIS IN GO Frankfurter Entwicklertag 2018 Ralf Wirdemann Navigate : Space / Arrow

Energy Dependence of Multiplicity Fluctuations in Heavy Ion Collisions Benjamin Lungwitz, IKF

1&quot; DataXFormer:&quot;Leveraging&quot;the&quot;Web&quot;

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

1" DataXFormer:"Leveraging"the"Web"