Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production Engineer, MySQL Infrastructure, Facebook
Continuous Restores – Why? •Verify backup integrity • Haven’t tested your backups? You don’t have them •Understand resource requirements for restores • Time, server capacity, network capacity •Test restore orchestration • Don’t panic during a disaster event
Backups
Backups •Everything, every day
server Shard mysq mysq 86 User 1 l l User 2 shar shar . d d . User shar shar 1000 d d shar shar d d shar shar d d shar shar d d shar shar d d
Backups •Everything, every day •Tens of thousands of servers, many, many shards •Stored in HDFS •3 types: •Full dumps •Differential backups •Binary Logs
Backups – Full dumps •mysqldump •--single-transaction •--set-gtid-purged=COMMENTED, stored in metadata •github.com/facebook/mysql-5.6 •Per shard, not per instance •Shards aren’t pinned to an instance •Easier to locate
Backups – Full dumps •Logical, not physical •Easy single-table restore •No indexes backed up, reduced size •Easy to debug •Better compression •Every 5 days
Backups – Differential backups •Diff between 2 full dumps •2 files generated - Rows deleted, Rows inserted •Based off of existing dump in HDFS •Take full dump -> compute diff -> upload diff
Backups – Differential backups
Backups – Differential backups •Diff between 2 full dumps •2 files generated - Rows deleted, Rows inserted •Based off of existing dump in HDFS •Take full dump -> compute diff -> upload diff •Possible because of logical full dumps
Backups – Binary Logs •Raw binlogs from mysql •Record Previous-GTIDs in metadata •Using mysqlbinlog: • --start-position=4 • --stop-position=121 •Continuously uploaded •Binlog Server to interface
Backups •Everything, every day •Tens of thousands of servers, many, many shards •Stored in HDFS •3 types: •Full dumps •Differential backups •Binary Logs
Restore Infrastructure
Restore Infrastructure OR C Reque st Warchief CRT Schedul e ORC DB Syn c Peon Peon Peon MySQ MySQ MySQ L L L
ORC - Peons
Restore Infrastructure - Peons •Python process server peo peo •Exposes Thrift n n interface mysq mysq •Manages MySQL l l restored restored db db Instance restored restored db db •Multiple peons per host restored restored db db •Work on restore jobs restored db •1 database == 1 job
Restore Jobs – State Machine 1. SELECT – Select backup to restore 2. DOWNLOAD – Download backup to disk 3. LOAD – Load tables in parallel 4. VERIFY – Verify loaded full/diff dump 5. REPLAY – Replay binlogs + FAILED states
Restore Jobs – mysqldump index •Custom index added to mysqldump output •Locate tables using per-table byte offsets •Preserved across diff backups
Restore Jobs – SELECT •Pre-defined search range per job •1 day == restore most recent backup only •Use metadata to ignore “bad” backups •Only full/diff backups selected here
Restore Jobs – DOWNLOAD •For full dumps: •Download mysqldump from HDFS to disk •For diff backups: •Stream diffs + full dump to perform 3-way merge •Store recreated mysqldump to disk •Single mysqldump output at the end
Restore Jobs – LOAD •Split backup into per-table streams using index •Load tables in parallel
Restore Jobs – VERIFY •Perform sanity checks on loaded data •Possible verification methods: •Compare checksums •Compare number of tables loaded •Best verification: replay binlogs
Restore Jobs – REPLAY •Replay X seconds/minutes/hours worth of binlogs •Enables point in time restores •Verify loaded data can be written to
Restore Jobs – State Machine LOAD TABLE LOAD TABLE SELECT DOWNLOAD LOAD TABLE VERIFY REPLAY LOAD TABLE LOAD TABLE
Restore Jobs – Pipeline •Allow processing many jobs concurrently •Peon has 1 handler per job state •Limit per-state concurrency
Restore Jobs – Pipeline DOWNLOA LOAD VERIF REPLA SELEC LOA D TABLE Y Y T D SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5
Restore Jobs – Pipeline DOWNLOA LOAD VERIF REPLA SELEC LOA D TABLE Y Y T D SLOT db1963 db142 db165 db982 db7 – t1 db2362 1 SLOT db1000 db3794 db1340 db7 db7 – t2 db651 2 SLOT db982 – 3 t3 SLOT db982 – 4 t9 SLOT db7 – t5 5
Restore Jobs – Pipeline •Allow processing many jobs concurrently •Peon has 1 handler per job state •LOAD -> LOAD + LOAD_TABLE •Limit per-state concurrency •Slot capacity can be tuned
Restore Jobs – Selecting Binlogs •Full/diff backups can be taken from slaves •Binlogs always taken from master •Need to uniquely identify transactions… •GTIDs!
Restore Jobs – Selecting Binlogs •Use GTIDs purged from mysqldump: • --set-gtid-purged=COMMENTED •Use GTIDs purged from mysqlbinlog: • --start-position=4 • --stop-position=121 •These are GTIDs purged , not GTIDs contained
Restore Jobs – Selecting Binlogs •GTIDs purged from LOAD state ( dump_gtids ): •GTIDs purged in binlogs ( binlog_gtids ): Contains 451-529 Contains 530-774 Contains 775-? First binlog to First replay superset
Restore Jobs – Replaying Binlogs •Filter binlog events with mysqlbinlog: • --database • --skip-gtids • --skip-empty-trans • github.com/facebook/mysql-5.6 •Last transaction to replay: • --stop-datetime
Restore Job – Lifecycle DOWNLOA LOAD VERIF REPLA SELEC LOA D TABLE Y Y T D SLOT db100 – db100 – db100 db100 1 t1 t2 –t3 SLOT 2 SLOT 3 SLOT 4 SLOT 5
CRT
Restore Infrastructure OR C Reque st Warchief CRT Schedul e ORC DB Syn c Peon Peon Peon MySQ MySQ MySQ L L L
CRT – Continuous Restore Tier •Create restore jobs for shards •Periodically poll backup metadata to find backups •Thrift call to Warchief to issue restore job •Monitor restore progress •Peons expose job and health stats via Thrift •Detailed stats written to Scuba
CRT – Continuous Restore Tier •Failure handling •Retry on transient errors •Mark “unstable” •Notify oncall •Orchestration component •Resolve dependency graph •Request restores in order of dependency
On-demand Restores
“oops, I accidentally ran UPDATE without WHERE”
“oops, I thought I was on my dev database”
“oops”
Restore Infrastructure OR C Reque st Warchief CRT Reque Schedul st e UI/CLI ORC DB Syn c Peon Peon Peon MySQ MySQ MySQ L L L
Discover Edge Cases
War Stories
War Stories – Collation Change • COLLATE=latin1_bin •Required for MyRocks •Schema change deployed •Most diff backup restores fail overnight •Thanks CRT Monitoring!
200x increase in DOWNLOAD_FAILED !
War Stories – Collation Change •DifferentialBackup used default collation •Sorting order broken in diff files •Consistent sorting is required for 3-way merge •3-way merges failed for affected backups •Wouldn’t have been caught without continuous restores •DifferentialBackup fixed to understand table collation
War Stories – RBR Binlogs •Row-based replication deployed •Lots of REPLAY failures overnight “ The database used for the current transaction has changed since BEGIN. This is not supported! ” •2 common patterns: •Shards belonged to same database tier •Failure only happened after master promotion
War Stories – RBR Binlogs •SBR binlogs: •Each BEGIN contains session database •RBR binlogs: •On master, each BEGIN contains session database •On slave, BEGIN has no session database
War Stories – RBR Binlogs •With mysqlbinlog --skip-gtids --skip-empty-trans •Expected: session databases don’t change across events •Binlogs not rotated after master promotion •For some part of binlog, instance is a slave •For some part of binlog, session database is empty •For some part of binlog, session databases don’t match “The database used for the current transaction has changed since BEGIN. This is not supported!”
War Stories – RBR Binlogs •Possible solutions: •Rotate binlogs during promotion •Fix MySQL server behaviour •Patch mysqlbinlog •???
Thank s! Divij Rajkumar (divij@fb.com)
Recommend
More recommend