MongoDB Backup and Recovery Field Guide Tim Vaillancourt Sr Technical Operations Architect, Percona
`whoami` { name: “tim”, lastname: “vaillancourt”, employer: “percona”, techs: [ “mongodb”, “mysql”, “cassandra”, “redis”, “rabbitmq”, “solr”, “python”, “golang” ] } 2
Agenda ● History ● Methods ○ Logical ○ Binary ■ Cold ■ LVM ■ Hot Backup ● Integrity / Consistency ○ mongodb_consistent_backup ● Architecture ● Restore and Validation 3
History ● 3000-4000 BC: Culturally significant data backed up in a universal format ● 1400: The Printing Press ● 1600-1800: Chapultepec Aqueduct ● 1990s: Floppy and Zip Disks ● 2000s: No more Floppy/Zip Disks ● Present: All my data is on Google Drive and I have 7 days of hourly Time Machine backups! ● Future: ? 4
Replication != Backup ● Replication is not a backup! ○ Replication is High Availability ○ Including ■ Binary/Statement-based Replication of any type Delayed Replication*** ● ■ RAID Arrays ● <EOF> 5
Backup Methods
Logical Backups ● Tools ○ mongodump ■ Uses find() queries with $snapshot to backup all collections ■ Supports Gzip and Threading in 3.2+ ■ Outputs a directory containing bson files in various subdirectories ○ Custom Queries ■ The client API could be used similarly to mongodump to perform logical backups ● Benefits ○ Reduced storage footprint ○ Replication awareness ○ Compatibility ● Drawbacks 7
Binary Backups: Cold Backup ● Very simple process ● Causes full outage to MongoDB instance! ● Process ○ Stop mongod ○ Copy and archive dbPath ○ Start mongod 8
Binary Backups: LVM / Filer / Cloud Disk ● Process ○ If Non-Journalled ■ db.fsyncLock() ■ Keep session open ○ Create block-device snapshot ○ Unlock the database ■ db.fsyncUnlock() ○ Copy or achive the snapshot directory ○ Remove block devics snapshot (as quickly as possible!) ● LVM ○ Snapshots have been demonstrated to cause up to 30%* write latency impact to disk due to COW 9
Binary Backups: Hot Backup ● PSMDB or MongoDB Enterprise ○ Pay $$$ for MongoDB Enterprise or download PSMDB for free(!) ○ db.adminCommand({ createBackup: 1, backupDir: "/data/mongodb/backup" }) ○ Copy/archive the output path ○ Delete the backup output path ○ NOTE: ■ RocksDB-based createBackup creates filesystem hardlinks whenever possible! ■ Delete RocksDB backupDir as soon as possible to reduce bloom filter overhead! 10
Backup Integrity / Consistency
The “Distributed Cluster Backup Problem” ● Mongodump is single node consistent only! ● Common to most or all database techs in sharded environment ● Problems: ○ Backup tools consider single-instance integrity only ○ Backups of different shards may complete at different times ○ Changes replicate asynchronously ○ Data may be balancing / moving in the cluster ● Risks: ○ Orphaned documents / references ○ Holes in data 12
Backups: mongodb_consistent_backup ● Python project by Percona-Lab for consistent backups ● URL: https://github.com/Percona-Lab/mongodb_consistent_backup ● Best-effort support, not a “Percona Product” ● Created to solve limitations in MongoDB backup tools: ○ Replica Set and Sharded Cluster awareness ○ Cluster-wide Point-in-time consistency ○ In-line Oplog backup (vs post-backup) ○ Notifications of success / failure ● Extra Features ○ Remote Upload (AWS S3, Google Cloud Storage and Rsync) ○ Archiving (Tar or ZBackup deduplication and optional AES-at-rest) ○ CentOS/RHEL7 RPMs and Docker-based releases (.deb soon!) 13
Backups: mongodb_consistent_backup ● 1.2.0 ○ Multi-threaded Rsync Upload ○ Replica Set Tags support ○ Support for MongoDB SSL / TLS connections and client auth ○ Rotation / Expiry of old backups (locally-stored only) ● Future ○ Incremental Backups ○ Binary-level Backups (Hot Backup, Cold Backup, LVM, Cloud-based, etc) ○ More Notification Methods (PagerDuty, Email, etc) ○ Restore Helper Tool ○ Instrumentation / Metrics ○ <YOUR AWESOME IDEA HERE> we take GitHub PRs (and it’s Python) ! 14
Backup Architecture
Architecture: Simple Example ● Method ○ Run mongodump (with --oplog) using a plain secondary ○ Store backups with on-site remote storage (filer, rsync, etc) ● Potential Issues ○ Application Impact ■ I/O and CPU impact due to backups may affect application ■ Storage-engine and FS caches will become dirty ■ Primary Failure ● A failure of the Primary may cause the Secondary backing-up to become Primary ● This can be avoided by using a Read Preference of ‘secondary’ (supported in recent mongodump versions) ○ No Disaster Recovery 16
Architecture: Tag-Based Example ● Replica Set Tags ○ Allow selection of MongoDB nodes using key/value pairs ○ Represented in JSON/single document ○ Many key/value pairs is possible ● Example Backup from “west” Only ○ Specify a single node with a tag such as { location: “west” } ○ Use Read Preference Tag in mongodump/mongodb_consistent_backup to target a specific node. 17
Architecture: Offsite Backup Example ● Example ○ Create backup within local datacenter ○ Upload completed backups to other datacenter, cloud, etc ■ mongodb_consistent_backup supports Amazon S3, Google Cloud Storage and Rsync for remote upload! ● Benefits ○ Fast backup time due to in-datacenter latency ● Drawbacks ○ A full backup data uploaded each backup job 18
Architecture: Disaster Recovery Example ● Example ○ Place a SECONDARY node in another location ■ Dedicated node is recommended to reduce impact ■ hidden:true recommended ○ Run backup from off-site SECONDARY member ○ Optionally upload to Cloud Storage ● Benefits ○ Only changes (replication) replicated to offsite location ○ Potentially faster uploads to Cloud Storage ● Drawbacks ○ Bootstrap / Initial Sync may use high bandwidth (if not seeded by backup) 19
Restore and Validation “It’s not a backup system, it’s a restore system” ~ Raymond Blum, Google SRE
Restoring and Validation ● Methodology ○ Optimise restore time, not backup run time ■ Users and business care how fast their data is back, not how long it takes to backup ■ Binary-level backups are much faster to restore in MongoDB ● Validation ○ This is very application specific ○ Random sample restored data and validate ■ Example: Compare to Production ● Compare real Production item, user, article, etc to backup ● Ensure backup age doesn’t cause false alarms, ie: test data older than backup ■ Example: Integration Test / QA ● Run code integration tests or QA on restored data ■ Example: Production Backup as Test Data ● Copy Production Data to Test periodically using backups 21
Thank You Sponsors! 22
April 23-25, 2018 SAVE THE DATE! Santa Clara Convention Center CALL FOR PAPERS OPENING SOON! www.perconalive.com 23
Questions? 24
Recommend
More recommend