Deciding When to Forget in the Elephant File System By Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton, and Jacob Or Presented By Jon LeVitre CS 533 Concepts of Operating Systems March 14, 2006 Slide 1
Overview ● Background ● The Elephant File System – Goals and Ideas – Design – Policies – Implementation – Performance ● Summary CS 533 Concepts of Operating Systems March 14, 2006 Slide 2
Background ● Disk space is becoming cheaper and larger ● Data is protected from most forms of failure (except user error) ● It's a good time to consider ways to make file systems protect users from themselves CS 533 Concepts of Operating Systems March 14, 2006 Slide 3
Previous Work ● Cedar used copy on write to automatically retain recent versions ● Plan-9, AFS, and WAFL use checkpointing ● Applications maintain document history ● Trashcan prevents accidental deletion ● Users make their own copies CS 533 Concepts of Operating Systems March 14, 2006 Slide 4
Goals of EFS ● Give users the ability to undo recent changes (both writes and deletes) ● To save storage space, long term history only has important versions CS 533 Concepts of Operating Systems March 14, 2006 Slide 5
Observations ● The user's ability to recognize crucial file versions deteriorates over time ● Any solution that relies solely on the user to identify landmark versions is problematic (but they need to be allowed to do it) CS 533 Concepts of Operating Systems March 14, 2006 Slide 6
Types of Files ● Read-only ● Derived ● Cached ● Temporary* ● User-modified* * Only these need protection CS 533 Concepts of Operating Systems March 14, 2006 Slide 7
General Design ● Logical file deletion ● Copy-on-write (version becomes official when file is closed) ● File versions are named by combining pathname, date, and time (not unique) ● Retention policy specified by file (or group of files) ● File system cleaner reclaims blocks CS 533 Concepts of Operating Systems March 14, 2006 Slide 8
File Retention Policies ● Keep One ● Keep Save – Recent changes only ● Keep Landmarks – Heuristic: keep long-lived versions – Also let users specify landmarks ● Keep All CS 533 Concepts of Operating Systems March 14, 2006 Slide 9
File Implementation ● The inumber points to an imap ● Temperature is a heuristic used by the cleaner ● For non-versioned files, imap points to an inode ● Otherwise, imap points to an inode log CS 533 Concepts of Operating Systems March 14, 2006 Slide 10
Directory Implementation ● Directories map names to inumbers ● Directories store versioning information explicitly ● Each directory entry stores the creation time and delete time (if any) ● Entries for deleted files can be moved to a history inode CS 533 Concepts of Operating Systems March 14, 2006 Slide 11
Microbenchmark Results ● Files with Keep One Policy was about the same as FFS (write had a bug) ● Files with Keep All policy – Slightly slower open, write, and close – Much slower creation – Much faster deletion CS 533 Concepts of Operating Systems March 14, 2006 Slide 12
More Microbenchmarks ● Andrew file system benchmark (creates directory hierarchy, copies 70 source files totaling 200KB files, traverses directories, and opens and reads each file) – EFS was ~5% slower (19 seconds vs 18 seconds) – Much more file meta data: ● FFS used 18KB for inodes ● EFS used 444KB for inode logs ● For larger test, EFS was twice as fast as FFS CS 533 Concepts of Operating Systems March 14, 2006 Slide 13
File Profiles ● Keep One: 33.6% of files, 56.3% of bytes (98.7% of bytes written) ● Keep Safe: 3.9% of files, 28.5% of bytes (only 0.6% of bytes written) ● Keep Landmark: 62.4% of, 15.2% of bytes (only 0.7% of bytes written) CS 533 Concepts of Operating Systems March 14, 2006 Slide 14
Impact on Buffer-Cache? ● Buffer-Caches reduce the number of writes to disk to improve performance ● Elephant must write to disk when file is closed, even if ... – There is a write shortly after close – The file is deleted right after close. ● This should be rare, so the impact should be minimal CS 533 Concepts of Operating Systems March 14, 2006 Slide 15
Summary ● Performance similar to FFS ● Only a few files need versioning ● Robustness verified using NFS shadowing ● “....we believe that the extra storage and disk write overhead incurred by using a file system such as Elephant is of minimal cost compared to the convenience and time gains... made possible” ● More research was needed CS 533 Concepts of Operating Systems March 14, 2006 Slide 16
Recommend
More recommend