mysql backup and restore at facebook scale
play

MySQL Backup and Restore at Facebook Scale Ola Berjak Production - PowerPoint PPT Presentation

MySQL Backup and Restore at Facebook Scale Ola Berjak Production Engineer at MySQL Infrastructure, Facebook London MySQL Backup and Restore at Facebook Scale and how its not rocket science Ola Berjak Production Engineer at MySQL


  1. MySQL Backup and Restore at Facebook Scale Ola Berjak Production Engineer at MySQL Infrastructure, Facebook London

  2. MySQL Backup and Restore at Facebook Scale …and how it’s not rocket science Ola Berjak Production Engineer at MySQL Infrastructure, Facebook London

  3. 3

  4. When do we need backups? How do we perform backups? How do we restore backups? 4

  5. When do we need backups? How do we perform backups? How do we restore backups? 5

  6. When do we need backups? How do we perform backups? How do we restore backups? 6

  7. When do we need backups? 7

  8. MALICIOUS ATTACKER HARDWARE FAILURE HUMAN ERROR 8

  9. MALICIOUS ATTACKER HARDWARE FAILURE HUMAN ERROR 9

  10. FULL DUMPS DIFFS 10

  11. How do we perform backups? 11

  12. Every database, every day 12

  13. LOGICAL BACKUPS PHYSICAL BACKUPS Ea Easy sy Complex CUSTOMER LOGIC DEBUGGING Ea Easy sy Complex SINGLE TABLE Ea Easy sy Complex RESTORE PORTABILITY Con Consistent Inconsistent BACKUP AND Long Short RESTORE DURATION 13

  14. Technical setup mysqldump number of rows for each table FULL DUMPS zstd compression trailing index 14

  15. mysqldump --single-transaction --skip-lock- tables (...) 15

  16. mysqldump --single-transaction --skip-lock- tables (...) | compress and add index 16

  17. mysqldump --single-transaction --skip-lock- tables (...) | compress and add index | upload 17

  18. Trailing index { "size": 7331, "offset": 1337, "table_name": "foo" }, { "size": 223, "offset": 8668, "table_name": "bar" } 18

  19. Open-source mysqldump: https://github.com/facebook/mysql-5.6 zstd: https://github.com/facebook/zstd 19

  20. Open source tooling will get the job done 20

  21. Open source tooling will get the job done au autom omysql sqlbacku ackup scheduling email notifications custom backup rotation 21

  22. Open source tooling will get the job done au autom omysql sqlbacku ackup mo monitoring tools scheduling email notifications alerting custom backup rotation 22

  23. 23

  24. Technical setup full dump format 2 files for a single diff backup DIFFS rows removed rows inserted and updated 24

  25. most recent dump backup, new dump “base dump” DiffDatabase diff2 diff1 CREATE TABLE foo CREATE TABLE foo INSERT INTO foo INSERT INTO foo -- rows for foo: 1337 -- rows for foo: 7331 CREATE TABLE bar CREATE TABLE bar INSERT INTO bar INSERT INTO bar 25

  26. diff1 diff2 CREATE TABLE foo CREATE TABLE foo INSERT INTO foo INSERT INTO foo -- rows for foo: 1337 -- rows for foo: 1337 CREATE TABLE bar CREATE TABLE bar INSERT INTO bar INSERT INTO bar base dump MergeDatabase new dump 26

  27. F D D D D F D D D D F 27

  28. 2-3x+ less space used 28

  29. Explore the open source tooling 29

  30. Explore the open source tooling au autom omysql sqlbacku ackup scheduling email notifications custom backup rotation di different ntial al bac acku kups 30

  31. Due diligence checklist 31

  32. Due diligence checklist • verify the size 32

  33. Due diligence checklist • verify the size • set up alerting 33

  34. Due diligence checklist • verify the size • set up alerting • store checksums and metadata 34

  35. MALICIOUS ATTACKER HARDWARE FAILURE HUMAN ERROR 35

  36. Technical setup all transactions for all databases from master BINARY LOGS compressed using zstd metadata stored 36

  37. Due diligence checklist • verify the size • set up alerting • store checksums and metadata 37

  38. Due diligence checklist • verify the size • set up alerting • store checksums and metadata • detect gaps in transactions backed up 38

  39. Due diligence checklist • verify the size • set up alerting • store checksums and metadata • detect gaps in transactions backed up • monitor the ’’backup lag” 39

  40. How do we restore backups? 40

  41. Continuous restore pipeline 41

  42. Continuous restore pipeline Scheduler 42

  43. Continuous restore pipeline Warchief Loadbalancer Scheduler 43

  44. Continuous restore pipeline Loadbalancer Scheduler DB 44

  45. Continuous restore pipeline Loadbalancer Scheduler DB Worker Worker Worker Worker MySQL MySQL MySQL MySQL 45

  46. SELECT DOWNLOAD LOAD CHECKSUM VERIFY REPLAY 46

  47. SELECT DOWNLOAD LOAD CHECKSUM VERIFY REPLAY 47

  48. SELECT DOWNLOAD LOAD CHECKSUM VERIFY REPLAY 48

  49. SELECT DOWNLOAD LOAD CHECKSUM VERIFY REPLAY 49

  50. SELECT DOWNLOAD LOAD CHECKSUM VERIFY REPLAY 50

  51. SELECT DOWNLOAD LOAD CHECKSUM VERIFY REPLAY 51

  52. Start small and build up 52

  53. DEVELOPMENT TIME DATA RESILIENCE BUSINESS PRIORITIES BUSINESS CONTINUITY 53

  54. Today’s agenda Why do we need backups? Backups and restores made easy How to make sure our backups don’t go to ’’/dev/null”? 54

  55. Today’s agenda Why do we need backups? Backups and restores made easy How to make sure our backups don’t go to ’’/dev/null”? 55

  56. Today’s agenda Why do we need backups? Backups and restores made easy How to make sure our backups don’t go to ’’/dev/null”? 56

  57. 57

  58. “ The best outages are the ones that don’t happen. ” PRETTY MUCH EVERY PRODUCTION ENGINEER I KNOW 58

  59. Thank you 59

  60. Ola Berjak aberjak@fb.com @Lexxzor 60

Recommend


More recommend