Kernel File System Membrane Bug Membrane is a layer of material which serves as a selective barrier between two phases and remains impermeable to specific particles, molecules, or substances when exposed to the action of a driving force. Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci‐Dusseau, Remzi H. Arpaci‐Dusseau, Michael M. Swift
Bugs are common in any large software File systems contain 1,000 – 100,000 loc Recent work has uncovered 100 s of bugs [Engler OSDI ’00, Musuvathi OSDI ’02, Prabhakaran SOSP ‘03, Yang OSDI ’04, Gunawi FAST ‘08, Rubio-Gonzales PLDI ’09] Error handling code, recovery code, etc. File systems are part of core kernel A single bug could make the kernel unusable 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 2
File assert() BUG() panic() FS developers are good System at detecting bugs xfs 2119 18 43 ubifs 369 36 2 “Paranoid” about failures ocfs2 261 531 8 gfs2 156 60 0 afs 106 38 0 Lots of checks all over ext4 42 182 12 the file system code! reiserfs 1 109 93 ntfs 0 288 2 Number of calls to assert, BUG, and Detection is easy but recovery is hard panic in Linux 2.6.27 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 3
App App App App i_count 0x00002 Inode VFS VFS Crash Address mapping File System File System File systems manage their Processes could potentially Process killed on crash own in‐memory objects use corrupt in‐memory file‐system objects Inconsistent Hard to free kernel state FS objects No fault isolation Common solution: crash file system and hope problem goes away after OS reboot 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 4
To develop perfect file systems Tools do not uncover all file system bugs Bugs still are fixed manually Code constantly modified due to new features Make file systems handle all error cases Interacts with many external components ▪ VFS, memory mgmt., network, page cache, and I/O Cope with bugs than hope to avoid them 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 5
Membrane: OS framework to support lightweight, stateful recovery from FS crashes Upon failure transparently restart FS Restore state and allow pending application requests to be serviced Applications oblivious to crashes A generic solution to handle all FS crashes Last resort before file systems decide to give up 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 6
Implemented Membrane in Linux 2.6.15 Evaluated with ext2, VFAT, and ext3 Evaluation Transparency : hide failures (~50 faults) from appl. Performance : < 3% for micro & macro benchmarks Recovery time : < 30 milliseconds to restart FS Generality : < 5 lines of code for each FS 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 7
Motivation Restartable file systems Evaluation Conclusions 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 8
Fault Detection Fault Fault Helps detect faults quickly Anticipation Detection Membrane Fault Anticipation Fault Records file‐system state Recovery Fault Recovery Executes recovery protocol to cleanup and restart the failed file system 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 9
Correct recovery requires early detection Membrane best handles “ fail‐stop ” failures Both hardware and software‐based detection H/W : null pointer, general protection error, ... S/W : asserts(), BUG(), BUG_ON(), panic() Assume transient faults during recovery Non‐transient faults: return error to that process 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 10
Fault Fault Anticipation Detection Membrane Fault Recovery 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 11
Additional work done in anticipation of a failure Issue : where to restart the file system from? File systems constantly updated by applications Possible solutions: Make each operation atomic Leverage in‐built crash consistency mechanism Not all FS have crash consistency mechanism Generic mechanism to checkpoint FS state 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 12
Checkpoint : consistent state of the file system that can be safely rolled back to in the event of a crash App App App VFS All requests enter via VFS layer ext 3 VFAT Control requests to FS File System & dirty pages to disk File systems write to disk Page Cache through page cache Disk 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 13
App App App VFS VFS VFS STOP STOP File System File System File System Disk Disk Disk ✓ ✓ ✓ ✓ Consistent image Page Cache Page Cache Page Cache Consistent Consistent Consistent ✓ Image # 1 Image # 2 Image # 3 Can be written STOP back to disk Disk Disk Disk On crash roll back to last consistent Image Copy‐on‐Write During Checkpoint After Checkpoint Regular Membrane 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 14
App On crash: flush dirty pages of last checkpoint VFS Throw away the in‐memory state File System Crash Remount from the last checkpoint ✓ ✓ ✓ Consistent file‐system image on disk ✓ Page Cache Issue: state after checkpoint would be lost STOP Operations completed after checkpoint returned back to applications Disk After Recovery On Crash Need to recreate state after checkpoint 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 15
Log operations along with their return value Replay completed operations after checkpoint Operations are logged at the VFS layer File‐system independent approach Logs are maintained in‐memory and not on disk How long should we keep the log records? Log thrown away at checkpoint completion 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 16
Fault Fault Anticipation Detection Membrane Fault Recovery 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 17
Important steps in recovery: 1. Cleanup state of partially‐completed operations 2. Cleanup in‐memory state of file system 3. Remount file system from last checkpoint 4. Replay completed operations after checkpoint 5. Re‐execute partially complete operations 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 18
Multiple threads inside file system Intertwined execution User App App App App VFS VFS Crash Kernel File System File System File System FS code should not be trusted after crash Page Cache Application threads killed? Processes cannot be killed after crash ‐ application state will be lost Clean way to undo incomplete operations 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 19
Skip: file‐system code Trust: kernel code (VFS, memory mgmt., …) ‐ Cleanup state on error from file systems How to prevent execution of FS code? Control capture mechanism : marks file‐system code pages as non‐executable Unwind Stack : stores return address (of last kernel function) along with expected error value 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 20
E.g., create code path in ext2 sys_open() 1 1 Release fd do_sys_open() 3 fn vfs_create Clear buffer 2 filp_open() rax rbp rsi Release Zero page open_namei() regs rdi namei data rbx rcx Mark not dirty 2 rdx r8 … vfs_create() fn blk..._write rval ‐ENOMEM ‐ENOMEM ext2_create() ext2_create() rax rbp rsi ext2_addlink() regs rdi rbx rcx membrane fault rdx r8 … ext2_prepare_write() 3 ‐EIO rval ‐EIO block_prepare_write() membrane fault ext2_get_block() ext2_get_block() Crash Unwind Stack Kernel is restored to a consistent state Kernel File system Non‐executable 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 21
Fault Fault Anticipation Detection Membrane Fault Recovery 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 22
Periodically create 5 3 1 checkpoints Open (“file”) write() read() write() link() Close() Application 2 File System Crash VFS 6 checkpoint Unwind in‐flight 3 processes File System 2 Move to recent 4 4 checkpoint 1 T 0 T 1 T 2 Replay completed 5 operations time Re‐execute Legend: Completed In-progress Crash 6 unwound process 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 23
Motivation Restartable file systems Evaluation Conclusions 3/2/10 Membrane: Operating System Support for Restartable File Systems (FAST '10) 24
Recommend
More recommend