Deciding when to forget in the Elephant file system Doug Santry, Mike Feeley, Norm Hutchinson, Alistair Veitch * , Ross Carton, and Jacob Ofir University of British Columbia Hewlett-Packard Laboratories *
Protecting file system data z System and media failure y Focus of file-system research for many years z User and application failure y No protection y Delete and write cause data loss y Artifact of limited storage capacity University of British Columbia 2 SOSP 99
Storage is no longer limiting z Disk capacity trends y 25 Ð 35 GB now y Increasing by 60% per year y 250 Ð 350 GB in 5 years z Disks are now: y Big enough to keep some old versions y Not big enough to keep everything University of British Columbia 3 SOSP 99
Protecting data with big disks z Key idea y Retain important old versions of files y System, not user, controls storage reclamation z Key issues y Is versioning at granularity of file or file system? y How long are old versions retained? y How can users control retention safely? University of British Columbia 4 SOSP 99
Previous work z File-system grain y Copy-on-write checkpoint of entire file system y Performed periodically y E.g., Plan-9, WAFL, AFS z File grain y Copy-on-write of individual files y Performed continuously y E.g., Cedar, VMS x Retained last few versions x No protection from delete University of British Columbia 5 SOSP 99
Elephant overview z Delete and write y Do not cause data loss immediately z Storage reclamation y File-grain retention policies specified by users y Policies implemented by system cleaner z User interface y Rollback to any point in the past x {open,cd,É} filename@yesterday:12:00 University of British Columbia 6 SOSP 99
Talk outline z Principles and retention policies z Prototype implementation y Meta data y File and name histories z Evaluation y Workload analysis y User experience University of British Columbia 7 SOSP 99
Protection depends on file type z Read only z System managed y Derived y Cached y Temporary z User managed University of British Columbia 8 SOSP 99
Principles z Near-term reversibility y Of every operation on valuable data y For a limited period of time z Long-term history y Of selected files y Including only selected landmark versions University of British Columbia 9 SOSP 99
File-grain retention policies z Keep One y Update date in place and immediate delete z Keep All y Retain all versions z Keep Safe y Retain all versions for second-chance interval z Keep Landmarks y Retain only landmark versions University of British Columbia 10 SOSP 99
Potential-landmark heuristic z Key observations y Files are modified in barrages y Ability to differentiate edits degrades with time z Strategy y Designate lead edit of barrage as landmark y Barrage ÒgranularityÓ increases with time edits time potential landmarks University of British Columbia 11 SOSP 99
History discontinuities z Deleted versions y Discontinuity in fileÕs history y System can report all discontinuities to user z Grouping files y User groups related files y A landmark of any file is landmark for group University of British Columbia 12 SOSP 99
User implemented policies z New policies y Written as user-level programs y Registered with kernel y Used in the same way as standard polices z Cleaning y System cleaner execs user-policy program y Runs with privileges of fileÕs owner University of British Columbia 13 SOSP 99
Elephant prototype z Implementation y New VFS in FreeBSD 2.2.8 z Interface y Add time to any pathname Òfile@timeÓ y Set processÕs default time y Set fileÕs policy or group files y Make version a landmark y Read a fileÕs history y Tools including: tls, tgrep, tdiff, and tview University of British Columbia 14 SOSP 99
Versioning meta data z Inode history y Inode log contains fileÕs copy-on-write inodes y Inode added to log on first write after open y Non-versioned files stored by standard inode z Name history y Directory lists name creation and deletion time y Name retained until all file versions are deleted y Old names periodically moved to history inode University of British Columbia 15 SOSP 99
Two views of history z File (inode) history y All versions of a file independent of its name y Rename not reflected in file history z Name history y Name can refer to different files at different times y Some applications rely on name history x Modify file by first renaming to backup (e.g., emacs) z Elephant provides both views of history University of British Columbia 16 SOSP 99
Workload analysis z Measured system y Workgroup server at HP Labs y Supporting 12 active researchers y Used for development, document prep., etc. y 15 GB, 360,000 files, 27,000 directories z Analysis y File-type distribution y Write-traffic distribution University of British Columbia 17 SOSP 99
File-type taxonomy z Source y C, C++, perl, shell scripts z Documents y text, HTML, word processor, mail z Derived y object, library, exec, postscript, PDF z Archive y tar, compressed, data z Temporary y *.tmp, web-browser caches University of British Columbia 18 SOSP 99
Allocating policies by file type z Keep One y Derived y Temporary z Keep Safe y Archive z Keep Landmarks y Source y Documents y Other University of British Columbia 19 SOSP 99
Storage by policy 15.2 28.5 62.4 Keep Landmarks Keep Safe 3.9 Keep One 56.3 33.6 Files (%) Bytes (%) University of British Columbia 20 SOSP 99
Write traffic z Trace y Same HP-Labs workgroup server y Collected Aug 29 Ð Oct 8, used Sep 27 Ð Oct 1 y Records all open, close, read, and write y Includes file name z Summary y 112 MB / day written on average y 15 GB of total storage, 12 active users University of British Columbia 21 SOSP 99
Storage growth by policy 0.7 15.2 0.6 28.5 62.4 Keep Landmarks Keep Safe 98.7 3.9 Keep One 56.3 33.6 Files (%) Bytes (%) Writes (%Bytes) University of British Columbia 22 SOSP 99
Importance of file-grain retention 3.4 File-system checkpoint Elephant 0.042 30-day history (GB) University of British Columbia 23 SOSP 99
NFS shadowing z Problem y Would you trust your data to a research FS? z Solution y Elephant prototype can shadow an NFS server x Snoops network for NFS packets x Updates shadow Elephant file system y Users x Create and update files via NFS x Read current and historic versions via Elephant University of British Columbia 24 SOSP 99
Conclusions z Protecting data from users and applications y Files require different degrees of protection x Reversibility: all versions for limited period x History: landmark versions forever y Important versions are small fraction of disk z Elephant y File-grain retention policies specified by users y Retains all important older versions y Rollback file, directory, or fs to any point in past University of British Columbia 25 SOSP 99
Recommend
More recommend