Cumulus: Filesystem Backup to the Cloud 7th USENIX Conference on File and Storage Technologies (FAST ’09) Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego February 26, 2009 Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 1 / 19
Introduction ◮ Cloud computing important emerging area, with a spectrum of implementations ◮ “Thick” cloud: Purchase a complete integrated service from a provider ◮ Potentially greater efficiencies ◮ Easier to set up ◮ “Thin” cloud: Customer builds application on more generic services ◮ More choices among service providers ◮ Easier to migrate between providers ◮ Potentially lower costs ◮ Thin cloud offers some advantages, particularly for applications such as backup ◮ How well can we do with such a simple interface? Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 2 / 19
Cumulus: Background and Requirements ◮ Network Backup: Functionality ◮ Implement backup over a network to provide easy off-site storage ◮ Store snapshots of file data at multiple points in time ◮ Allow recovery of selected files or entire snapshot Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 3 / 19
Cumulus: Background and Requirements ◮ Network Backup: Functionality ◮ Implement backup over a network to provide easy off-site storage ◮ Store snapshots of file data at multiple points in time ◮ Allow recovery of selected files or entire snapshot ◮ System Requirements ◮ Build on a thin cloud model: simple storage interface only ◮ Storage layer need only support put / get of blobs of data, list , delete ◮ Implies that application logic must be built into client ◮ Focus on cloud storage, but could be FTP server, friend’s computer, P2P network, . . . Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 3 / 19
Cumulus: Background and Requirements ◮ Network Backup: Functionality ◮ Implement backup over a network to provide easy off-site storage ◮ Store snapshots of file data at multiple points in time ◮ Allow recovery of selected files or entire snapshot ◮ System Requirements ◮ Build on a thin cloud model: simple storage interface only ◮ Storage layer need only support put / get of blobs of data, list , delete ◮ Implies that application logic must be built into client ◮ Focus on cloud storage, but could be FTP server, friend’s computer, P2P network, . . . ◮ Goals ◮ Minimize resource requirements (storage, network) ◮ Minimize ongoing monetary costs Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 3 / 19
Cumulus Backup Format Monday Snapshot Roots Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 4 / 19
Cumulus Backup Format Monday Snapshot Roots photos/A photos/B mbox paper Metadata Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 4 / 19
Cumulus Backup Format Monday Snapshot Roots photos/A photos/B mbox paper Metadata Data photoA photoB mbox1 paper1 Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 4 / 19
Cumulus Backup Format Monday Monday Tuesday Tuesday Snapshot Roots Shared photos/A photos/B mbox paper mbox' paper' Metadata Data photoA photoB mbox1 paper1 mbox2 paper2 ◮ Stores filesystem snapshots at multiple points in time ◮ Data blocks shared within, between snapshots ◮ Minimizes storage, upload bandwidth needed Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 4 / 19
Aggregation: Minimizing Per-Block Costs Monday Tuesday Segments Snapshot Roots photos/A photos/B mbox paper mbox' paper' Metadata Data photoA photoB mbox1 paper1 mbox2 paper2 ◮ May have per-file in addition to per-byte costs ◮ Protocol overhead: Slower backups from more transactions ◮ Per-file overhead at storage server ◮ May be exposed as monetary cost by provider ◮ Cumulus reduces these costs by aggregating blocks into segments before storage ◮ Aggregation follows from our constraints, but may not be needed in other systems Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 5 / 19
Aggregation Challenges: Internal Fragmentation Day 1 Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 6 / 19
Aggregation Challenges: Internal Fragmentation Day 1 Day 2 Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 6 / 19
Aggregation Challenges: Internal Fragmentation Day 1 Day 2 Day 3 Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 6 / 19
Aggregation Challenges: Internal Fragmentation Day 1 Day 4 (new data) Day 2 Day 4 (repacked data) Day 3 ◮ Wasted space within segments reclaimed by segment cleaning ◮ Tradeoff: space vs. upload bandwidth ◮ Contribution: Show how to tune segment size, threshold for cleaning Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 6 / 19
Cumulus Implementation ◮ Implemented as ≈ 4000 lines C++, Python ◮ Execution packages new data into segments, uploads to storage server ◮ Client tracks some data locally (not essential for restores): ◮ Block hash database ◮ Previous snapshot metadata (detect changed files) ◮ Other features: ◮ Compression/encryption ◮ Sub-file incremental updates ◮ More details in the paper ◮ In real use: I have been using it for over 18 months Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 7 / 19
Evaluation Key Questions: ◮ What is the resource (network, storage) overhead imposed by the restricted storage interface? ◮ How do these overheads translate into monetary terms? ◮ How can aggregation and cleaning be tuned to minimize the cost? ◮ How does the prototype perform? Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 8 / 19
Evaluation Traces Fileserver User Duration (days) 157 223 Entries 26673083 122007 Files 24344167 116426 File Sizes Median 0.996 KB 4.4 KB Average 153 KB 21.4 KB Maximum 54.1 GB 169 MB Total 3.47 TB 2.37 GB Update Rates New data/day 9.50 GB 10.3 MB Changed data/day 805 MB 29.9 MB Total data/day 10.3 GB 40.2 MB Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 9 / 19
Evaluation Traces Fileserver User Duration (days) 157 223 Entries 26673083 122007 Files 24344167 116426 File Sizes Median 0.996 KB 4.4 KB Average 153 KB 21.4 KB Maximum 54.1 GB 169 MB Total 3.47 TB 2.37 GB Update Rates New data/day 9.50 GB 10.3 MB Changed data/day 805 MB 29.9 MB Total data/day 10.3 GB 40.2 MB Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 9 / 19
Backup Simulation ◮ Compare against optimal backup performance: ◮ All unique data must be stored at server ◮ All new data must be transferred over network ◮ In simulation, compare Cumulus against these baseline values ◮ Consider effect of aggregation, cleaning parameters ◮ For simplicity, ignore compression and metadata ◮ Effects discussed in paper Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 10 / 19
Is Cleaning Necessary? 1 ◮ Without segment 0.95 cleaning, storage 0.9 utilization steadily 0.85 Storage Utilization decreases 0.8 ◮ Weekly cleaning 0.75 keeps overhead 0.7 within a narrow 0.65 range 0.6 ◮ Exact overhead 0.55 With Cleaning No Cleaning depends on cleaning 0.5 0 50 100 150 200 parameters Time (days) Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 11 / 19
How Much Data is Transferred? 40 52 16 MB Segments Overhead vs. Optimal (%) 35 4 MB Segments 50 1 MB Segments Raw Size (MB/day) 30 512 kB Segments 48 128 kB Segments 25 46 ◮ Aggressive cleaning, 20 44 large segments 15 increase overhead 42 10 40 5 38 0 0 0.2 0.4 0.6 0.8 1 Cleaning Threshold Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 12 / 19
How Much Data is Transferred? 40 52 16 MB Segments Overhead vs. Optimal (%) 35 4 MB Segments 50 1 MB Segments Raw Size (MB/day) 30 512 kB Segments 48 128 kB Segments 25 46 ◮ Aggressive cleaning, 20 44 large segments 15 increase overhead 42 10 40 5 38 0 0 0.2 0.4 0.6 0.8 1 Cleaning Threshold Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 12 / 19
What is the Storage Overhead? 25 16 MB Segments ◮ Large segments 3.3 Overhead vs. Optimal (%) 4 MB Segments increase overhead 20 1 MB Segments 3.2 512 kB Segments Raw Size (GB) ◮ Too little cleaning 128 kB Segments 3.1 15 leads to large 3 overheads 10 ◮ Aggressive cleaning 2.9 leads to churn, 5 2.8 storage overhead 2.7 when keeping 0 0 0.2 0.4 0.6 0.8 1 multiple snapshots Cleaning Threshold Vrable, Savage, Voelker (UCSD) Cumulus: Filesystem Backup to the Cloud February 26, 2009 13 / 19
Recommend
More recommend