CS5460: Operating Systems Lecture 20: File System Reliability CS 5460: Operating Systems
File System Optimizations Technique Effect Disk buffer cache Eliminates problem Modern Aggregated disk I/O Reduces seeks Prefetching Overlap/hide disk access Disk head scheduling Reduces seeks Historic Disk interleaving Reduces rotational latency Goal: Reduce or hide expensive disk operations CS 5460: Operating Systems
Buffer/Page Cache Idea: Keep recently used disk blocks in kernel memory Process reads from a file: – If blocks are not in buffer cache » Allocate space in buffer cache Q: What do we purge and how? » Initiate a disk read » Block the process until disk operations complete – Copy data from buffer cache to process memory – Finally, system call returns Usually, a process does not see the buffer cache directly mmap() maps buffer cache pages into process RAM CS 5460: Operating Systems
Buffer/Page Cache Process writes to a file: – If blocks are not in the buffer cache » Allocate pages » Initiate disk read » Block process until disk operations complete – Copy written data from process RAM to buffer cache Default: writes create dirty pages in the cache, then the system call returns – Data gets written to device in the background – What if the file is unlinked before it goes to disk? Optional: Synchronous writes which go to disk before the system call returns – Really slow! CS 5460: Operating Systems
Performing Large File I/Os Idea: Try to allocate contiguous chunks of file in large contiguous regions of the disk – Disks have excellent bandwidth, but lousy latency! – Amortize expensive seeks over many block read/writes Question: How? – Maintain free block bitmap (cache parts in memory) – When you allocate blocks, use a modified “ best fit ” algorithm, rather than allocating a block at a time (pre-allocate even) Problem: Hard to do this when disk full/fragmented – Solution A: Keep a reserve (e.g., 10%) available at all times – Solution B: Run a disk “ defragger ” occasionally CS 5460: Operating Systems
Prefetching Idea: Read blocks from disk ahead of user request Goal: Reduce number of seeks visible to user – If block read before request à à hits in file buffer cache File System User Read 0 Read 0 Read 1 Read 1 Read 2 Read 2 Problem: What blocks should we prefetch? – Easy: Detect sequential access and prefetch ahead N blocks – Harder: Detect periodic/predictable “ random ” accesses CS 5460: Operating Systems
Fault Tolerance and Reliability CS 5460: Operating Systems
Fault Tolerance What kinds of failures do we need to consider? – OS crash, power failure » Data not on disk is lost; rarely, partial writes – Disk media failure » Data on disk corrupted or unavailable – Disk controller failure » Large swaths of data unavailable temporarily or permanently – Network failure » Clients and servers cannot communicate (transient failure) » Only have access to stale data (if any) – … (what else?) CS 5460: Operating Systems
Techniques to Tolerate Failure Careful disk writes and “ fsck ” – Leave disk in recoverable state even if not all writes finish – Run “ disk check ” program to identify/fix inconsistent disk state RAID: – Redundant Array of Inexpensive Independent Disks – Write each block on more than one independent disk – If disk fails, can recover block contents from non-failed disks Logging – Rather than overwrite-in-place, write changes to log file – Use two-phase commit to make log updates transactional Clusters – Replicate data at the server level CS 5460: Operating Systems
Careful Writes Order writes so that disk state is recoverable – Accept that disk contents may be inconsistent or stale – Run sanity check program to detect and fix problems Properties that should hold at all times – All blocks pointed to are not marked free – All blocks not pointed to are marked free – No block belongs to more than one file Goal: Avoid major inconsistency Not a goal: Never lose data CS 5460: Operating Systems
Careful Writes Example To create a file, you must: – Allocate and initialize an inode – Allocate and initialize some data blocks – Modify the directory file of the directory containing the file – Modify the directory file ’ s inode (last modified time, size) In what order should we do these writes? How to add transactional (all or nothing) semantics? How do careful writes interact with optimizations? CS 5460: Operating Systems
Careful Writes Exercise To delete a file, you must: – Deallocate the file ’ s inode – Deallocate the file ’ s disk blocks – Modify the directory file of the directory containing the file – Update the directory file ’ s inode In what order should we do these operations? – Consider what intermediate states are recoverable via fsck CS 5460: Operating Systems
Soft Update Rules Never point to a block before initializing it Never reuse a block before nullifying pointers to it Never reset last pointer to live block before setting a new one Always mark free-block bitmap entries as used before making the directory entry point to it CS 5460: Operating Systems
Careful Writes: More Exercises To write a file, you must: – Modify (and perhaps allocate) the file ’ s disk blocks – Modify the file ’ s inode (size and last modified time) – Maybe, modify indirect block(s) To move a file between directories, you must: – Modify the source directory – Modify the destination directory – Modify the inodes of both directories CS 5460: Operating Systems
RAID Goal: Organize multiple physical disks into a single high-performance, high-reliability logical disk RAID I/O bus CPU ctlr. Issues to consider: – Multiple disks à à higher aggregate throughput (more spindles) – Multiple disks à à (hopefully) independent failure modes – Multiple disks à à vulnerable to individual disk failures (MTTF) – Writing to multiple disks for replication à à higher write overhead CS 5460: Operating Systems
Possible Uses of Multiple Disks Striping – Spread pieces of a single file across multiple disks – Advantages: » Can service multiple independent requests in parallel » Can service single “ large ” requests in parallel – Issues: » Interleave factor » How the data is striped across disks Redundancy (replication) – Store multiple copies of blocks on independent disks – Advantages: » Can tolerate partial system failure à à How much? – Issues: » How widely do you want to spread the data? CS 5460: Operating Systems
Types of RAID RAID level Description 0 Data striping w/o redundancy 1 Disk mirroring 2 Parallel array of disks w/ error correcting disk (checksum) 3 Bit-interleaved parity 4 Block-interleaved parity 5 Block-interleaved, distributed parity CS 5460: Operating Systems
RAID Level 0 Striping – Spread contiguous blocks of a file across multiple spindles – Simple round-robin distribution Non-redundant – No fault tolerance Advantages – Higher throughput – Larger storage Disadvantages RAID ctlr. – Lower reliability – any drive failure destroys the file system I/O bus – Added cost CPU CS 5460: Operating Systems
RAID Level 1 Mirroring – Write complete copies of all blocks to multiple disks – How many copies à à how much reliability No striping – No added write bandwidth – Potential for pipelined reads Advantage: – Can tolerate disk failures RAID ( “ availability ” ) ctlr. Disadvantage: – High cost (extra disks and RAID I/O bus controller) CPU Q: How to recover from drive failure? CS 5460: Operating Systems
RAID Level 5 Mirroring + striping + distributed parity – Spread contiguous blocks of a file across multiple spindles – Adds parity information » Example: XOR of other blocks Combines features of 0 & 1 Advantages – Higher throughput – Lower cost (than level 1) RAID – Any single disk can fail ctlr. Disadvantages I/O bus – More complexity in RAID controller CPU – Slower recovery time than RAID 1 RAID 6: 2 parity disks CS 5460: Operating Systems
RAID Tradeoffs Space efficiency Minimum number of disks Number of simultaneous failures tolerated Read performance Write performance Time to recover from a failed disk Complexity of controller CS 5460: Operating Systems
RAID Discussion RAID can be implemented by hardware or software – Hardware RAID implemented by RAID controller » Often supports hot swapping using hot spare disks » Not totally clear that cheap RAID HW is worth it – Software RAID implemented by OS kernel (device driver) Multiple parity disks can handle multiple errors Nested RAID – Can use a RAID array as a “ disk ” in a higher level RAID » RAID 1+0: RAID 0 (striping) run across RAID 1 (mirrored) arrays » RAID 0+1: RAID 1 (mirroring) run across RAID 0 (striped) arrays CS 5460: Operating Systems
RAID Discussion What are the risks due to purchasing a large number of disks at the same time for use in a RAID? Hot spares can be useful What does a RAID look like to the file system code? RAID summary – Tolerates failed disks – May not deal well with correlated failure modes – Can improve sustained transfer rate – Does not improve individual seek latencies CS 5460: Operating Systems
Recommend
More recommend