file system project seminar on disk layout
play

File System Project Seminar On-Disk Layout Prof. Andreas Polze - PowerPoint PPT Presentation

File System Project Seminar On-Disk Layout Prof. Andreas Polze Andreas Grapentin, Sven Khler Max Plauth, Jossekin Beilharz, Felix Eberhardt Hasso Plattner Institute File System Seminar Overview program open readdir today Virtual File


  1. File System Project Seminar On-Disk Layout Prof. Andreas Polze Andreas Grapentin, Sven Köhler Max Plauth, Jossekin Beilharz, Felix Eberhardt Hasso Plattner Institute

  2. File System Seminar Overview program open readdir today Virtual File System proc ext4 btrfs fs File System Seminar On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Block Buffer disk Chart 2

  3. File System Seminar Tasks of A File System (Simplified) A file system needs to be … Searchable □ resolve filename to metadata □ resolve filename to data (streams, forks) □ find the corresponding block to a given file position Modifiable □ find space to add new data File System Seminar □ find space to add new metadata On-Disk Layout Polze, Grapentin, Köhler □ mark bad blocks Plauth, Beilharz, Eberhardt 14.11.2017 □ query existing free space Chart 3

  4. 1 disk Block Devices File System Seminar On-Disk Layout And Physics Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 4

  5. 1. Block Devices Cylinder-Head-Sector For several decades continously spinning magnetic disks were the gold standard for secondary storage. block Data is originally addressable by block-wise Cylinder-Head-Sector (CHS) tuples. To reduce movements of the head (arm) , data is kept along cylinders first. File System Seminar On-Disk Layout Modern busses allow Logical Block Polze, Grapentin, Köhler Addressing (LBA) by linear numbers. Plauth, Beilharz, Eberhardt 14.11.2017 Chart 5

  6. 1. Block Devices Sector Vs. Block Vs. Cluster Main overhead factors when accessing data: ■ latency (seek+rotational): How long to wait for the first byte? ■ throughput (transfer rate): How many bytes per second once started? In the latency time for one byte, several others can be transferred. Insight: Group bytes into blocks and even bigger ones on FS level. (within a track) Multiple names: sector cluster physical block logical block File System Seminar On-Disk Layout device block file system block Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 6

  7. 1. Block Devices Cylinder Groups Many file systems (UFS, ext, NTFS) use block groups to keep semantically connected data within one cylinder. Reduces head seeks and limits fragmentation within partition. SB Meta Files SB Meta Files SB Meta Files redundant superblock backups File System Seminar On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 7

  8. 1. Block Devices Tracking Available (Free) Space ■ Blocks can be occupied. ■ Blocks can become bad (bit rot). ■ Likewise need to track available free inodes 0x001 0xa00 ■ Use a linked list: 0x003 0xab0 0x040 0xab1 next next File System Seminar block #0 is occupied 1110001010001010 On-Disk Layout 0100010111010011 Polze, Grapentin, Köhler ■ Use a bitmap: 0101111011101001 Plauth, Beilharz, Eberhardt block #54 is free 0011100101110010 14.11.2017 Chart 8

  9. 2 File System Seminar Files And Directories On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 9

  10. 2. Files And Directories Storing Files – Overview How to find files, if the inode/file descriptor is already at hand. Different methods to store files with different use cases exist: □ continuous allocation □ linked list – separated linked List (FAT) □ indexed block references – direct data □ extents File System Seminar On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 10

  11. 2. Files And Directories Storing Files – Continuous Allocation File system keeps track of free blocks. For a new file the necessary number of blocks is reserved and only a (start_block, size) tuple stored. Advantages free Simple implementation No file fragmentation Very few seek times File 1 File 2 File 3 Disadvantages File System Seminar On-Disk Layout Growing files need expensive move Polze, Grapentin, Köhler High external fragmentation Plauth, Beilharz, Eberhardt 14.11.2017 Chart 11

  12. 2. Files And Directories Storing Files –Linked List The inode/file descriptor/metadata points to the first block. Each block contains data and a pointer to the next block in this file. Advantages Data can be distributed across device Files can be resized Inode No external fragmentation data data data next next end Disadvantages “Odd” wasted space per data block File System Seminar On-Disk Layout High file fragmentation risk Polze, Grapentin, Köhler No random access Plauth, Beilharz, Eberhardt 14.11.2017 High seek times Chart 12

  13. 2. Files And Directories Storing Files – Separated Linked List The inode/file descriptor/metadata points to the first block. Data fills entire blocks. A separate table tracks for each block either its successor, or if it’s the last block. Also free and bad blocks can be tracked. Example: FAT File System Seminar On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 13

  14. 2. Files And Directories Storing Files – Indexed Block References The inode contains a fixed number of block references. For files exceeding that number an additional block may be pointed at, containing an additional block references and so forth. e" Advantages Small files require small overhead Random access efficient for small files ," Sparse files possible "all" Disadvantages " File System Seminar On-Disk Layout Limits file size Polze, Grapentin, Köhler High file fragmentation risk Plauth, Beilharz, Eberhardt 14.11.2017 Many seeks for random access on large files Chart 14

  15. 2. Files And Directories Storing Files – Indexed Block References (Direct Data) The inode repurposes the block references to store actual data. Commonly used for lock files, pid files and symbolic links. No more seeks required. e" ," "all" e.g. for 32-bit block references: " File System Seminar (11 + 3) * 32 bit = 14 * 4 byte = 56 bytes available On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 15

  16. 2. Files And Directories Storing Files –Extents Multiple large, contiguous area reserved for file ranges. Each extent is addressed by only a (first block, length) tuple. Each files extents can be stored as (linked) lists or trees. Advantages Inode Very little overhead data required Limits file fragmentation Extent 1 Extent 3 Extent 2 Disadvantages Difficult to add extents on fragmented systems File System Seminar On-Disk Layout Copy-on-write for small changes is very expensive Polze, Grapentin, Köhler Require block buffer to allocate large areas on flush. Plauth, Beilharz, Eberhardt 14.11.2017 Chart 16

  17. 2. Files And Directories Directory Entry Structure inode bin attributes bin inode home attributes home usr attributes usr inode var attributes var inode attributes in directory entry separated attributes File System Seminar On-Disk Layout (FAT) (Unix) Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 17

  18. 2. Files And Directories Directory Structure (Canonical) ino name[MAX] ino rec_len name ino name[MAX] ino 0 rec_len rec_len unused name ino name[MAX] ino rec_len name ino name[MAX] table of fixed length entries Linked list of variable sized entries (Minix 1, FAT16) (ext2, VFAT) File System Seminar On-Disk Layout <complexity find()?> Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt Can still be contiguous on disk 14.11.2017 O(n) Chart 18 Fast unlink (ext2) marks ino as 0

  19. 3 14 21 7 10 23 25 19 File System Seminar B-Trees On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 19

  20. 3. B-Trees Recap: Glossary Search Trees ■ node, root, leaf ■ parent, child, sibling 19 ■ degree, depth, height ■ pre-order, in-order, post-order, level-order 10 23 ■ full, complete, balanced ■ value vs. key, value-tuple 7 14 21 ■ AVL-tree, Red-Black-Trees File System Seminar <any term unfamiliar?> On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 Chart 20

  21. 3. B-Trees Definition ■ A B-Tree is a search tree for keys 14 21 ■ All leaves have the same depth ■ Classified by a parameter B: □ B ≤ #children < 2 · B 7 10 19 23 25 □ B – 1 ≤ #keys < 2 · B – 1 B = 2 ■ (Keys within a node are sorted) <valid #keys for B=4?> File System Seminar On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 B = 3 Chart 21

  22. 3. B-Trees Theory Vs. Reality: Complexity 19 14 21 10 23 7 10 23 25 19 7 14 21 ? O(log 3 n) O(log 2 n) File System Seminar = ⊇ ⊆ On-Disk Layout Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt log 2 n 14.11.2017 = c · log 2 n log 3 n = Chart 22 O ( f ) = { g : N → R | ∃ c > 0 ∃ n 0 ∈ N ∀ n > = n 0 : g ( n ) < = c · f ( n ) } log 2 3

  23. 3. B-Trees Theory Vs. Reality: Number of Block Seek Operations 19 14 21 stored in level-order 10 23 7 10 23 25 19 7 14 21 O(log 3 n) O(log 2 n) File System Seminar On-Disk Layout k = 1: log 2 (2 30 ) = 30 seeks Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt k = 2: log 3 (2 30 ) = 18 seeks 14.11.2017 Chart 23 k = 1024: log 1024 (2 30 ) = 3 seeks

Recommend


More recommend