COW File Systems Why COW File Systems? (copy-on-write) Data and metadata not updated in place, but Small writes are expensive written to new location With RAID, an update requires four disk I/Os transforms random writes into sequential writes Caches filter reads Update New indirect More important to make writes efficient New data block free space bitmap data block free space bitmap block Update Widespread adoption of flash storage Inode Adding a block Inode Update Inode writing in place a 4KB page wold require erasing a 512KB erasure block bitmap to a file wear leveling, which spreads writes across all cells, important to maximize flash life COW techniques used to virtualize block addresses and redirect writes to cleared erasure blocks Large storage capacities enable versioning indirect indirect block block versioning is easy with COW! Traditional COW The core idea The core idea Inode Indirect Inode Indirect Data Data Root Root Array Blocks Blocks Array Blocks Blocks Inode File’ s Inode Array Inode File’ s Inode Array Indirect Data Indirect Data Inode Inode Indirect Blocks (in Inode Fie) Indirect Blocks (in Inode Fie) Blocks Blocks Blocks Blocks Inodes stored in a file, pointed by root inode Traditional Traditional COW COW Fixed Fixed Fixed Fixed Anywhere Anywhere Anywhere Anywhere Location Location Location Location
The core idea The core idea Inode Indirect Inode Indirect Data Data Root Root Array Blocks Blocks Array Blocks Blocks Inode File’ s Inode Array Inode File’ s Inode Array Indirect Data Indirect Data Inode Inode Indirect Blocks (in Inode Fie) Indirect Blocks (in Inode Fie) Blocks Blocks Blocks Blocks Traditional Traditional COW COW Fixed Fixed Fixed Fixed Anywhere Anywhere Anywhere Anywhere Location Location Location Location The core idea File access in FFS Inode Indirect Data Root Array Blocks Blocks Inode File’ s Inode Array Indirect Data Inode Indirect Blocks (in Inode Fie) Blocks Blocks What does it take to read /Users/lorenzo/wisdom.txt? Read Inode for “/” (root) from a fixed location Read first data block for root Read Inode for /Users Read first data block of /Users Read Inode for /Users/lorenzo Read first data block for /Users/lorenzo Traditional COW Read Inode for /Users/lorenzo/wisdom.txt Read data blocks for /Users/lorenzo/wisdom.txt “Cache is a man’ s best friend” Fixed Fixed Anywhere Anywhere Location Location
The Abstraction Stack The Abstraction Stack I/O systems are accessed through I/O systems are accessed through Application Application a series of layered abstractions a series of layered abstractions Library Library File System File System Block Cache Physical Device Block Device Interface Device Driver MM I/O, DMA,Interrupts Physical Device The Abstraction Stack The Abstraction Stack I/O systems are accessed through I/O systems are accessed through Application Application a series of layered abstractions a series of layered abstractions Caches recently read blocks Caches recently read blocks Library Library Buffers recently written blocks Buffers recently written blocks File System File System Single interface to many devices, Block Cache Block Cache allows data to be read/written in Block Device Block Device fixed sized block Interface Interface Device Driver Device Driver MM I/O, MM I/O, DMA,Interrupts DMA,Interrupts Physical Device Physical Device
The Abstraction Stack The Abstraction Stack I/O systems are accessed through I/O systems are accessed through Application Application a series of layered abstractions a series of layered abstractions Caches recently read blocks Caches recently read blocks Library Library Buffers recently written blocks Buffers recently written blocks File System File System Single interface to many devices, Single interface to many devices, Block Cache Block Cache allows data to be read/written in allows data to be read/written in Block Device Block Device fixed sized block fixed sized block Interface Interface Translates OS abstractions and Translates OS abstractions and Device Driver Device Driver hw specific details of I/O devices hw specific details of I/O devices MM I/O, MM I/O, DMA,Interrupts DMA,Interrupts Control registers, bulk data Physical Device Physical Device transfer, OS notifications Caching and consistency Caching and consistency File systems maintain many data structures File systems maintain many data structures Bitmap of free blocks and inodes Bitmap of free blocks and inodes Directories Directories Inodes Inodes Data blocks Data blocks Data structures cached for performance Data structures cached for performance works great for read operations... works great for read operations... ...but what about writes? ...but what about writes? Write-back caches delay writes: higher performance at the cost of potential inconsistencies Write-through caches write synchronously but poor performance (fsync) do we get consistency at least?
Example: a tiny ext2 Example: a tiny ext2 6 blocks, 6 inodes 6 blocks, 6 inodes inode bitmap data bitmap inode bitmap data bitmap i-nodes data blocks i-nodes data blocks 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 -- Iv1 -- -- -- -- -- -- -- -- D1 -- -- Iv1 -- -- -- -- -- -- -- -- D1 D2 Suppose we append a Suppose we append a data block to the file data block to the file owner: lorenzo owner: lorenzo permissions: read-only permissions: read-only add new data block D2 add new data block D2 size: 1 size: 1 pointer: 4 pointer: 4 update inode pointer: null pointer: null pointer: null pointer: null pointer: null pointer: null Example: a tiny ext2 Example: a tiny ext2 6 blocks, 6 inodes 6 blocks, 6 inodes inode bitmap data bitmap inode bitmap data bitmap i-nodes data blocks i-nodes data blocks 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 -- Iv2 -- -- -- -- -- -- -- -- D1 D2 -- Iv2 -- -- -- -- -- -- -- -- D1 D2 Suppose we append a Suppose we append a data block to the file data block to the file owner: lorenzo owner: lorenzo permissions: read-only permissions: read-only add new data block D2 add new data block D2 size: 2 size: 2 pointer: 4 pointer: 4 update inode update inode pointer: 5 pointer: 5 pointer: null pointer: null pointer: null pointer: null update data bitmap update data bitmap What if a crash or power outage occurs between writes?
What if only a single What if two writes write succeeds? succeed? Just the data block (D2) is written to disk Inode and data bitmap updates succeed Data is written, but no way to get to it - in fact, D2 still Good news: file system is consistent! appears as a free block Bad news: reading new block returns garbage Write is lost, but FS data structures are consistent Inode and data block updates succeed Just the updated inode (Iv2) is written to disk File system inconsistency. Must be fixed If we follow the pointer, we read garbage File system inconsistency: data bitmap says block is free, Data bitmap and data block succeed while inode says it is used. Must be fixed File system inconsistency Just the updated bitmap is written to disk No idea which file data block belongs to! File system inconsistency: data bitmap says data block is used, but no inode points to it. The block will never be used. Must be fixed The Consistent Update Solution 1: Problem File System Checker Ethos: If it happens, I’ll do something about it Several file systems operations update multiple data structures Let inconsistencies happen and fix them post facto Create new file during reboot update inode bitmap and data bitmap Classic example: fsck write new inode Unix, 1986 add new file to directory file Would like to atomically move FS from one consistent state to another Even with write through we have a problem Disk only commits one write at a time!
FSCK Summary The uperblock Sanity check the superblock One logical superblock per file system, at a well-known location(s). It contains size of FS list of free blocks (today, a bitmap) number of free blocks and index of next free block size of inode list number of free nodes and index of next free node locks for free block and free node lists flag to indicate superblock has been modified FSCK Summary FSCK Summary Sanity check the superblock Sanity check the superblock Check validity of free block and inode bitmaps Check validity of free block and inode bitmaps Scan inodes, indirect blocks, etc to understand which Check that inodes are not corrupted blocks are allocated e.g., check type (dir, regular file, symbolic link) field On inconsistency, override free block bitmap if it can’ t be fixed, clear inode and update inode inconsistencies bitmap Perform similar check on inodes to update inode bitmap
Recommend
More recommend