tolera ng file system mistakes with envyfs
play

Tolera'ng FileSystem Mistakes with EnvyFS Swaminathan Sundararaman - PowerPoint PPT Presentation

Tolera'ng FileSystem Mistakes with EnvyFS Swaminathan Sundararaman Lakshmi N. Bairavasundaram Andrea C. ArpaciDusseau NetApp, Inc. Remzi H. ArpaciDusseau University of Wisconsin Madison File Systems in Todays World Modern


  1. Tolera'ng File‐System Mistakes with EnvyFS Swaminathan Sundararaman Lakshmi N. Bairavasundaram Andrea C. Arpaci‐Dusseau NetApp, Inc. Remzi H. Arpaci‐Dusseau University of Wisconsin Madison

  2. File Systems in Today’s World • Modern file systems are complex – Tens of thousands of lines of code (e.g., XFS 45K LOC) • Storage stack is also geVng deeper – Hypervisor, network, logical volume manager • Need to handle a gamut of failures – Memory alloca'on, disk faults, bit flips, system crashes • Preserve integrity of its meta‐data and user data 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 2

  3. File System Bugs • Bug reports for Linux 2.6 series from Bugzilla – ext3: 64, JFS: 17, ReiserFS: 38 – Some are FS corrup'on causing permanent data loss • FS bugs broadly classified into two categories – “fail‐stop” : System immediately crashes • Solu'ons: Nooks [ Swi/ 04 ], CuriOS [ David08 ] – “ fail‐silent ”: Accidentally corrupt on‐disk state • Many such bugs uncovered [ Prabhakaran05, Gunawi08, Yang04, Yang06b ] 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 3

  4. Bugs are inevitable in file systems Challenge: how to cope with them? 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 4

  5. N‐Version File Systems • Based on N‐version programming [ Avizienis77 ] – NFS servers [ Rodrigues01 ], databases [ Vandiver07 ], security [ Cox06 ] Applica'on • EnvyFS: Simple solware layer EnvyFS layer – Store data in N child file systems – Opera'ons performed on all children Child 1 Child N Child 2 … • Rely on a simple so-ware layer • Challenge: reducing overheads while Disk driver SIS layer retaining reliability Disk – SubSIST: Novel Single Instance Store 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 5

  6. Results • Robustness – Tradi'onal file systems handle few corrup'ons (< 4%) – EnvyFS 3 tolerates 98.9% of single file system mistakes • Performance – Desktop workloads: EnvyFS 3 has comparable performance – I/O intensive workloads: • Normal mode: EnvyFS 3 + SubSIST acceptable performance • Under memory pressure: EnvyFS 3 + SubSIST large overheads • Poten'al as a debugging tool for FS developers – Pinpoint the source of “ fail‐silent ” bug in ext3 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 6

  7. Outline • Introduc'on • Building reliable file systems • Reducing overheads with SubSIST • Evalua'on • Conclusion 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 7

  8. N‐Version Systems Development process: 1. Producing the specifica'on of solware 2. Implemen'ng N versions of the solware 3. Crea'ng N‐version layer — Executes different versions — Determines the consensus result 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 8

  9. 1. Producing Specifica'on • Our own specifica'on ? – Imprac'cal: Requires wide scale changes to file systems – Specifica'ons take years to get accepted • Can we leverage exis'ng specifica'on ? – Yes, can leverage VFS , but there are some issues • VFS not precise for N‐versioning purpose – Needs to handle cases where specifica'on is not precise – e.g., Ordering directory entries, inode number alloca'on 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 9

  10. Imprecise VFS Specifica'on File 1 Ordering directory entries File 2 File 3 Dir: test • Issue: File 1 – No specified return order File 2 No Entries Readdir: test File 3 – Can’t blindly compare entries EnvyFS layer File 1 File 2 File 3 • Solu'on: – Read all entries from a directory File 2 File 1 File 3 FS X FS Y … FS Z (dir: test in our case) from all FSes File 3 File 2 File 1 File 3 File 1 File 2 – Match entries from FSes Dir: test Dir: test Dir: test – Return majority results 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 10

  11. Imprecise VFS Specifica'on (cont) • Inode number alloca'on – Inode numbers returned through system calls – Each child file system issues different inode numbers – Possible solu'on: Force file systems to use same algorithm? – Our solu'on: Issue inode numbers at EnvyFS layer Stat: File 1 File 1 | ?? 15 Virt # FS 1 FS 2 FS 3 EnvyFS layer 15 10 36 65 File 1 | 10 File 1 | 36 File 1 |65 Inode Mapping Table File 1 10 File 2 04 File 3 99 FS X FS Z FS Y File 2 15 File 3 44 File 1 65 File 3 16 File 1 36 File 2 43 Inode Mapping Table not persistently stored Dir: test Dir: test Dir: test Inode Numbers 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 11

  12. 2. Implemen'ng N versions of FS • Painful process – High cost of development, long 'me delays • Lucky! Hard work already done for us – 30 different disk based file systems in Linux 2.6 • Which file systems to use? – ext3, JFS, ReiserFS in a three‐version FS – Others should work without modifica'ons 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 12

  13. 3. Crea'ng N‐Version Layer N‐Version layer ( EnvyFS ) • Applica'on – Inserted beneath VFS Read (file, 1 block) err , D VFS layer – Simple design to avoid bugs Read (file, 1 block) err , D Inode Mapping Table EnvyFS Example: Reading a file • Wrappers Comparators Layer – Allocate N data buffers – Read data block from the disk D err = D err = err = Read (…) Read (…) Read (…) D – Compare: data, return code, file posi'on F ReiserFS F F ext3 … – Return: data, return code JFS pos: x pos: x pos: x • Issues: D D D – Allocate memory for each read opera'on Disk – Extra copy from allocated buffer to applica'on – Comparison overheads 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 13

  14. Reading a File in EnvyFS • Solu'on: Applica'on Read (file, 1 block) err , D – Same applica'on buffer for all FS VFS layer – TCP‐like checksums for data comparison Read (file, 1 block) err , D – Compare: checksums, return code, file Inode Mapping Table EnvyFS posi'on Wrappers Comparators Layer – Read data un'l majority D err = D err = err = Read (…) Read (…) Read (…) D F ReiserFS F F ext3 … JFS pos: x pos: x pos: x D D D FS N # FS 1 # FS 2 # … 435 435 436 … Disk Checksums 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 14

  15. Outline • Introduc'on • Building reliable file systems • Reducing overheads with SubSIST • Evalua'on • Conclusion 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 15

  16. Case for Single Instance Storage (SIS) Applica'on • Ideal: One disk per FS VFS layer 1 • Prac'cal: One disk for all FS EnvyFS layer 2 1 N • Overheads FS N FS 1 FS 2 … – Effec've storage space: 1/N – N 'mes more I/O (Read/write) 1 2 N … Disk Req. Queue Disk 1 Disk 2 Disk N • Challenge: Maintain diversity … Disk Part 1 Part 2 Part N while minimizing overheads 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 16

  17. SubSIST: Single Instance Store Applica'on • Variant of an Single Instance Store VFS layer – Selec'vely merges data blocks D EnvyFS layer • Block addressable SIS – Exports virtual disks to FSes D D D FS N FS 1 – Manages mapping, free space info. … FS 2 – Not persistently stored on disk M D M D M D • EnvyFS writes through N file systems SubSIST Vdisk 1 Vdisk 2 Vdisk N – N data blocks merged to 1 data block – Content hashes not stored persistently CHash Layer Read Cache – Meta‐data blocks not merged Free Space Management – Inter FS blocks and not intra FS Disk 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 17

  18. Handling Data Block Corrup'ons? Applica'on  Corrup'on to data in a single FS VFS layer – Due to bugs, bit flips, storage stack – Corrupt data blocks not merged EnvyFS layer – All other N‐1 data blocks merged D D D D – Corrupt data block fixed at next read FS N FS 1 … FS 2 × Corrup'on to data block inside disk D D D D SubSIST Vdisk 1 Vdisk 2 Vdisk N • Single copy of data D D – Different code paths CHash Layer Read Cache – Different on‐disk structures Free Space Management D D Disk 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 18

  19. Outline • Introduc'on • Building reliable file systems • Reducing overheads with SubSIST • EvaluaHon – Reliability – Performance • Conclusion 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 19

  20. Reliability Evalua'on: Fault Injec'on VFS Robustness of EnvyFS in recovering from a EnvyFS layer child file system’s mistake? ReiserFS ext 3 • Corrup'on: bugs in FS / storage stack JFS • Types of disk blocks B B – superblock, inode, block bitmap, file data, … Pseudo • Perform different file ops Type‐aware fault injecHon [ P rabhakaran05 ] Device Driver – mount, stat, creat, unlink, read, … B B B • Report user visible results Block Driver • All results are applicable with SubSIST B B B except corrupHon to data blocks Disk 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 20

  21. Normal path traversal SET‐2 ( chmod ) SET‐1 ( stat, … ) getdirentries ext3 SET‐3 ( fsync ) Data truncate readlink umount rename symlink mount unlink loss E mkdir rmdir write creat read link Cannot INODE mount DIR Ops fail BMAP Data IMAP corrupt Result Matrix INDIRECT Crash DATA Read‐only SUPER JSUPER e Depends GDESC N/A 6/18/09 Tolera'ng File‐System Mistakes with EnvyFS 21

Recommend


More recommend