cs5460 operating systems lecture 20 file system
play

CS5460: Operating Systems Lecture 20: File System Reliability CS - PowerPoint PPT Presentation

CS5460: Operating Systems Lecture 20: File System Reliability CS 5460: Operating Systems File System Optimizations Technique Effect Disk buffer cache Eliminates problem Modern Aggregated disk I/O Reduces seeks Prefetching Overlap/hide


  1. CS5460: Operating Systems Lecture 20: File System Reliability CS 5460: Operating Systems

  2. File System Optimizations Technique Effect Disk buffer cache Eliminates problem Modern Aggregated disk I/O Reduces seeks Prefetching Overlap/hide disk access Disk head scheduling Reduces seeks Historic Disk interleaving Reduces rotational latency  Goal: Reduce or hide expensive disk operations CS 5460: Operating Systems

  3. Buffer/Page Cache  Idea: Keep recently used disk blocks in kernel memory  Process reads from a file: – If blocks are not in buffer cache » Allocate space in buffer cache  Q: What do we purge and how? » Initiate a disk read » Block the process until disk operations complete – Copy data from buffer cache to process memory – Finally, system call returns  Usually, a process does not see the buffer cache directly  mmap() maps buffer cache pages into process RAM CS 5460: Operating Systems

  4. Buffer/Page Cache  Process writes to a file: – If blocks are not in the buffer cache » Allocate pages » Initiate disk read » Block process until disk operations complete – Copy written data from process RAM to buffer cache  Default: writes create dirty pages in the cache, then the system call returns – Data gets written to device in the background – What if the file is unlinked before it goes to disk?  Optional: Synchronous writes which go to disk before the system call returns – Really slow! CS 5460: Operating Systems

  5. Performing Large File I/Os  Idea: Try to allocate contiguous chunks of file in large contiguous regions of the disk – Disks have excellent bandwidth, but lousy latency! – Amortize expensive seeks over many block read/writes  Question: How? – Maintain free block bitmap (cache parts in memory) – When you allocate blocks, use a modified “ best fit ” algorithm, rather than allocating a block at a time (pre-allocate even)  Problem: Hard to do this when disk full/fragmented – Solution A: Keep a reserve (e.g., 10%) available at all times – Solution B: Run a disk “ defragger ” occasionally CS 5460: Operating Systems

  6. Prefetching  Idea: Read blocks from disk ahead of user request  Goal: Reduce number of seeks visible to user – If block read before request à à hits in file buffer cache File System User Read 0 Read 0 Read 1 Read 1 Read 2 Read 2  Problem: What blocks should we prefetch? – Easy: Detect sequential access and prefetch ahead N blocks – Harder: Detect periodic/predictable “ random ” accesses CS 5460: Operating Systems

  7. Fault Tolerance and Reliability CS 5460: Operating Systems

  8. Fault Tolerance  What kinds of failures do we need to consider? – OS crash, power failure » Data not on disk is lost; rarely, partial writes – Disk media failure » Data on disk corrupted or unavailable – Disk controller failure » Large swaths of data unavailable temporarily or permanently – Network failure » Clients and servers cannot communicate (transient failure) » Only have access to stale data (if any) – … (what else?) CS 5460: Operating Systems

  9. Techniques to Tolerate Failure  Careful disk writes and “ fsck ” – Leave disk in recoverable state even if not all writes finish – Run “ disk check ” program to identify/fix inconsistent disk state  RAID: – Redundant Array of Inexpensive Independent Disks – Write each block on more than one independent disk – If disk fails, can recover block contents from non-failed disks  Logging – Rather than overwrite-in-place, write changes to log file – Use two-phase commit to make log updates transactional  Clusters – Replicate data at the server level CS 5460: Operating Systems

  10. Careful Writes  Order writes so that disk state is recoverable – Accept that disk contents may be inconsistent or stale – Run sanity check program to detect and fix problems  Properties that should hold at all times – All blocks pointed to are not marked free – All blocks not pointed to are marked free – No block belongs to more than one file  Goal: Avoid major inconsistency  Not a goal: Never lose data CS 5460: Operating Systems

  11. Careful Writes Example  To create a file, you must: – Allocate and initialize an inode – Allocate and initialize some data blocks – Modify the directory file of the directory containing the file – Modify the directory file ’ s inode (last modified time, size)  In what order should we do these writes?  How to add transactional (all or nothing) semantics?  How do careful writes interact with optimizations? CS 5460: Operating Systems

  12. Careful Writes Exercise  To delete a file, you must: – Deallocate the file ’ s inode – Deallocate the file ’ s disk blocks – Modify the directory file of the directory containing the file – Update the directory file ’ s inode  In what order should we do these operations? – Consider what intermediate states are recoverable via fsck CS 5460: Operating Systems

  13. Soft Update Rules  Never point to a block before initializing it  Never reuse a block before nullifying pointers to it  Never reset last pointer to live block before setting a new one  Always mark free-block bitmap entries as used before making the directory entry point to it CS 5460: Operating Systems

  14. Careful Writes: More Exercises  To write a file, you must: – Modify (and perhaps allocate) the file ’ s disk blocks – Modify the file ’ s inode (size and last modified time) – Maybe, modify indirect block(s)  To move a file between directories, you must: – Modify the source directory – Modify the destination directory – Modify the inodes of both directories CS 5460: Operating Systems

  15. RAID  Goal: Organize multiple physical disks into a single high-performance, high-reliability logical disk RAID I/O bus CPU ctlr.  Issues to consider: – Multiple disks à à higher aggregate throughput (more spindles) – Multiple disks à à (hopefully) independent failure modes – Multiple disks à à vulnerable to individual disk failures (MTTF) – Writing to multiple disks for replication à à higher write overhead CS 5460: Operating Systems

  16. Possible Uses of Multiple Disks  Striping – Spread pieces of a single file across multiple disks – Advantages: » Can service multiple independent requests in parallel » Can service single “ large ” requests in parallel – Issues: » Interleave factor » How the data is striped across disks  Redundancy (replication) – Store multiple copies of blocks on independent disks – Advantages: » Can tolerate partial system failure à à How much? – Issues: » How widely do you want to spread the data? CS 5460: Operating Systems

  17. Types of RAID RAID level Description 0 Data striping w/o redundancy 1 Disk mirroring 2 Parallel array of disks w/ error correcting disk (checksum) 3 Bit-interleaved parity 4 Block-interleaved parity 5 Block-interleaved, distributed parity CS 5460: Operating Systems

  18. RAID Level 0  Striping – Spread contiguous blocks of a file across multiple spindles – Simple round-robin distribution  Non-redundant – No fault tolerance  Advantages – Higher throughput – Larger storage  Disadvantages RAID ctlr. – Lower reliability – any drive failure destroys the file system I/O bus – Added cost CPU CS 5460: Operating Systems

  19. RAID Level 1  Mirroring – Write complete copies of all blocks to multiple disks – How many copies à à how much reliability  No striping – No added write bandwidth – Potential for pipelined reads  Advantage: – Can tolerate disk failures RAID ( “ availability ” ) ctlr.  Disadvantage: – High cost (extra disks and RAID I/O bus controller) CPU  Q: How to recover from drive failure? CS 5460: Operating Systems

  20. RAID Level 5  Mirroring + striping + distributed parity – Spread contiguous blocks of a file across multiple spindles – Adds parity information » Example: XOR of other blocks  Combines features of 0 & 1  Advantages – Higher throughput – Lower cost (than level 1) RAID – Any single disk can fail ctlr.  Disadvantages I/O bus – More complexity in RAID controller CPU – Slower recovery time than RAID 1  RAID 6: 2 parity disks CS 5460: Operating Systems

  21. RAID Tradeoffs  Space efficiency  Minimum number of disks  Number of simultaneous failures tolerated  Read performance  Write performance  Time to recover from a failed disk  Complexity of controller CS 5460: Operating Systems

  22. RAID Discussion  RAID can be implemented by hardware or software – Hardware RAID implemented by RAID controller » Often supports hot swapping using hot spare disks » Not totally clear that cheap RAID HW is worth it – Software RAID implemented by OS kernel (device driver)  Multiple parity disks can handle multiple errors  Nested RAID – Can use a RAID array as a “ disk ” in a higher level RAID » RAID 1+0: RAID 0 (striping) run across RAID 1 (mirrored) arrays » RAID 0+1: RAID 1 (mirroring) run across RAID 0 (striped) arrays CS 5460: Operating Systems

  23. RAID Discussion  What are the risks due to purchasing a large number of disks at the same time for use in a RAID?  Hot spares can be useful  What does a RAID look like to the file system code?  RAID summary – Tolerates failed disks – May not deal well with correlated failure modes – Can improve sustained transfer rate – Does not improve individual seek latencies CS 5460: Operating Systems

Recommend


More recommend