roadmap for section 10 1
play

Roadmap for Section 10.1 The Notion of Fault-Tolerance - PDF document

Unit OS10: Fault Tolerance Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume Management - Striped


  1. Unit OS10: Fault Tolerance Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume Management - Striped and Spanned Volumes Distributed File System (DFS) and File Replication Service (FRS) Network Load Balancing (NLB) Windows Clustering (MSCS) 3 1

  2. Fault-tolerant Systems Fault-tolerance is the property of a system that continues operating properly in the event of failure of some of its parts If its operating quality decreases at all, the decrease is proportional to the severity of the failure Fault-tolerance is particularly sought-after in high- availability or life-critical systems Fault-tolerance is not just a property of individual machines; it may also characterize the rules by which they interact 4 Fault Models and Protocols Need to specify Fault Model when discussing fault-tolerant (FT) systems All FT mechanisms in Windows are dealing with crash faults of computers or applications only Crash faults can be handled by replication in space or time 5 2

  3. Fault-tolerance (FT) by duplication Three approaches toward FT systems Replication: multiple identical system instances directing tasks or requests to all of them in parallel, and choosing the correct result on the basis of a quorum Redundancy: fail-over among multiple identical system instances fall-back or backup Diversity: multiple different implementations of the same spec. using them like replicated systems to cope with errors in a specific implementation 6 Fault-tolerance in NTFS - Increasing System Availability Transaction-based logging scheme Fast, even for large disks Recovery is limited to file system data Use transaction processing like SQL server for user data Tradeoff: performance versus fully fault-tolerant File System (FS) Design options for file I/O & caching: Careful write : VAX/VMS FS, other proprietary OS FS Lazy write : most UNIX FS, OS/2 HPFS 7 3

  4. Recoverable File System (Journaling File System) Safety of careful write FS / performance of lazy write FS Log file + fast recovery procedure Log file imposes some overhead Optimization over lazy write: distance between cache flushes increased NTFS supports cache write-through and cache flushing triggered by applications No extra disk I/O to update FS data structures necessary: all changes to FS structure are recorded in log file which can be written in a single operation In the future, NTFS may support logging for user files (hooks in place) 8 Recovery - Principles NTFS performs automatic recovery Based on update records and checkpoints in Log file Update records store sub operations that change File System structure NTFS writes checkpoint every 5 sec. Includes copy of transaction table and dirty page table Checkpoint includes LSNs of the log records containing the tables ; really a series of records - interleaved with update records Recovery depends on two NTFS in-memory tables: Transaction table : keeps track of active transactions (not completed) (sub operations of these transactions must be removed from disk) Dirty page table : records which pages in cache contain modifications to file system structure that have not yet been written to disk Dirty page Update Transaction Checkpoint Update Update table record table record record record Begin of checkpoint operation End of checkpoint operation 9 4

  5. Recovery - Passes 1. Analysis pass • NTFS scans forward in log file from beginning of last checkpoint • Updates transaction/dirty page tables it copied in memory • NTFS scans tables for oldest update record of a non-committed trans. 2. Redo pass • NTFS looks for “page update“ records which contain volume modification that might not have been flushed to disk • NTFS redoes these updates in the cache until it reaches end of log file • Cache manager “lazy writer thread“ begins to flush cache to disk 3. Undo pass • Roll back any transactions that were not committed when system failed • After undo pass – volume is at consistent state • Write empty LFS restart area; no recovery is needed if system fails now 10 Undo Pass - Example Power „Transaction committed“ record failure LSN LSN LSN LSN LSN LSN 4044 4045 4046 4047 4048 4049 Redo : Add the filename to the index Undo : Remove the filename from the index Redo : Allocate/Initialize an MFT file record Redo : Set bits 3-9 in the bitmap Undo : Deallocate the file record Undo : Clear bits 3-9 in the bitmap Transaction 1 was committed before power failure Transaction 2 was still active NTFS must log undo operations in log file! Power might fail again during recovery; NTFS would have to redo its undo operations 11 5

  6. NTFS Recovery - Conclusions Recovery will return volume to some preexisting consistent state (not necessarily state before crash) Lazy commit algorithm: log file is not immediately flushed when a „transaction committed“ record is written Log File Service batches records Flush when cache manager calls or check pointing record is written (once every 5 sec); also when log is full Several parallel transactions might have been active before crash NTFS uses log file mechanisms for error handling Most I/O errors are not file system errors NTFS might create MFT record and detect that disk is full when allocating space for a file in the bitmap NTFS uses log info to undo changes and returns „disk full“ error to caller 12 Fault Tolerance Support - using multiple disks NTFS‘ capabilities are enhanced by the fault-tolerant volume managers FtDisk/DMIO Lies above hard disk drivers in the I/O system‘s layered driver scheme FtDisk – for basic disks DMIO – for dynamic disks Volume management capabilities: Redundant data storage Dynamic data recovery from bad sectors on SCSI disks NTFS itself implements bad-sector recovery for non-SCSI disks 13 6

  7. Terminology Disks are a physical storage device such as a hard disk, a 3.5-inch floppy disk, or a CD-ROM A disk is divided into sectors, addressable blocks of fixed size Sector sizes are determined by hardware All current x86-processor hard disk sectors are 512 bytes, and CD-ROM sectors are typically 2048 bytes Future x86 systems might support larger hard disk sector sizes Partitions are collections of contiguous sectors on a disk A partition table or other disk-management database stores a partition's starting sector, size, and other characteristics Simple volumes are objects that represent sectors from a single partition that file system drivers manage as a single unit Multipartition volumes are objects that represent sectors from multiple partitions and that file system drivers manage as a single unit Multipartition volumes offer performance, reliability, and sizing features that simple volumes do not 14 Basic vs Dynamic Disks Two disk partitioning schemes used by Windows: Basic disk partitioning Dynamic disk partitioning Basic disks rely on MS-DOS-style disk partitioning Are really Windows legacy disks Partition information for each disk stored on disk Multipartition information not stored on disk can be lost when disk moved, OS reinstalled Dynamic disks implement a more flexible partitioning scheme Configuration of multipartition volumes is on disk and mirrored across the dynamic disks of the same computer. This allows for easy migration and minimizes chances of disk configuration loss. Disadvantage is that partitioning is not understood by other OS’s Laptops only support basic disks usually only disk and disks not removable All disks are basic disks unless created new as dynamic disks or converted 15 7

  8. Basic Disk Partitioning A disk has a sector called a Master Boot Record (MBR) as its first sector, that defines the first level of partitioning with its partition table MBR Boot code Boot sector 1 Extended partition Partitiion 2 boot record table Partitions within an 3 extended partition 4 Boot partition Partition 1 Partition 2 Partition 3 Partition 4 (extended) 16 Basic Disk Partitioning The MBR describes up to 4 primary partitions The first record of each primary partition is a boot record One primary partition can be marked “bootable” Each partition has a partition type (FAT, FAT32, NTFS, …) To overcome a 4-partition limit, a basic disks define a special type of partition called an extended partition Like a subdisk, complete with its own MBR In NT 4, configuration for multipartition volumes is stored in the Registry’s HKLM\System\Disk subkey Lost of system is reinstalled or disk is moved to another system 17 8

Recommend


More recommend