the berkeley file system the original file system
play

The Berkeley File System The Original File System Background Why - PowerPoint PPT Presentation

The Berkeley File System The Original File System Background Why is the bandwidth low? The original UNIX file system was implemented on a The file system used a 512 byte block size. This block PDP-11. size is to small with 10 ms disk


  1. The Berkeley File System The Original File System Background Why is the bandwidth low? • The original UNIX file system was implemented on a • The file system used a 512 byte block size. This block PDP-11. size is to small with 10 ms disk seek time. • All data transports used 512 byte blocks. • All inodes are located in the first blocks of the file system. This creates long seeks between the inode • File system I/O was buffered by the kernel. area and the data blocks at the disk. Commands that • When UNIX was ported to faster machines like VAX-11, alternately read inodes and data blocks (like ls -l) the original file system bandwidth (typical. 20 KByte/s) becomes especially inefficient. was to low. • The data blocks in a file may be randomly located at the • It is nothing in the file system interface that makes it disk (at least in a file system that have been in use for a inherently slow, Thus it is possible to keep the file long time). system interface and only change the implementation to make it faster. 1 2

  2. Transfer Time for Page The Berkeley File System How long time does it take to transfer a page between A first attempt. primary storage and disk memory? In the first attempt to improve the file system bandwidth, Notations: the block size was increased to 1024 bytes. The result was that the bandwidth was more than double T page Total transfer time for page compared to the original file system. T transport Transport time between primary storage and disk storage • Every file system operation can transport twice as much data. T wait Average rotational latency + seek time • The number of indirect blocks were reduced with bigger V Transport speed for page transport block size. L Page size Even after this change, the file system could not use more Transfer time: than 4% of the disk bandwidth. T page = T transport + T wait = L/V + T wait The bandwidth was higher for a new file system but Typical values: degenerated after some time (especially for read V = 10 Mbit/s, L = 10000 bits, T wait = 10 ms operations). This gives: T page = 1 ms + 10 ms The reason for this is that the list of free blocks is sorted in optimal order when the file system is created, but as new Thus for page sizes of 1 Kbyte or less, the wait time is files are created and removed the free list becomes totally dominating making the transfer time almost increasingly random. independent of page size. 3 4

  3. The Berkeley File System The Fast Berkeley Filesystem Methods to increase bandwidth: File system organization In order to improve locality, the file system is organized in • Use a big block size. cylinder groups . • Place related blocks close to each other. A cylinder group consists of a number of consecutive cylinders on the disk. Problem with block size A cylinder group contains: Big block sizes creates big fragmentation losses. • A copy of the super block. • Use variable block size. • Inodes (statically allocated when the file system is • Requires an allocation strategy to ensure that a file only created). contains one block of less than maximum size. • A bitmap to keep track of free blocks in the cylinder Locality group. Locating related data together requires that there is free • Data blocks. blocks at the wanted locations. The super block is stored in the cylinder groups in order to • Not everything can be located locally. have redundant copies in case of a file system crash. 5 6

  4. The Fast Berkeley Filesystem The Fast Berkeley Filesystem Block size Allocation of data blocks and fragments . • To be able to use big block size without getting to large • New data blocks in files are allocated in write fragmentation losses, the big blocks are divided in a operations. smaller fragments . • In order to keep the bandwidth that the big block size • The block size and fragment size is selected (within gives, only the last block in a file is allowed to contain certain limits) when the file system is created. fragments. • In order to be able to describe a 2 32 byte file with only two indirect levels, the minimum block size is 4096 bytes. • The fragment size cannot be smaller than the disk sector size (usually 512 bytes). • A block may consist of 2, 4 or 8 fragments. • A bit map in the cylinder group keeps track of free blocks at fragment level. 7 8

  5. Allocating New Blocks and Fragments Allocating New Blocks and Fragments Possibilities when writing new data to a file: • The problem with expanding a file one fragment at a 1. There is enough space left in an already allocated block time is that a file may be copied many times as a or fragment to hold the new data. fragmented block expands to a full block. • New data are written into available space. • To reduce the number of copy operations, data should 2. The file contains no fragmented blocks (and the last be written in units of full blocks when this is possible. block in the file contains insufficient space to hold the This method is used by the C standard I/O library. new data). • If space exists in a block already allocated, the space is filled with new data. If the remainder of the new data contains more than a full blocks, new full blocks are allocated until less than a full block remains. For the last part, a block with the necessary fragments are used. 3. The file contains one or more fragments (and the fragments contains insufficient space to hold the data). • If the size of new data + the size of data already in the fragments > the block size: → A new block is allocated and the fragments are copied to the beginning of the new block. Continue as in point 2. • Otherwise → A block with the necessary fragments or a full block is allocated. Copy the old fragments + new data into the allocated space. 9 10

  6. Placement of Data Blocks Strategies for Placement of Data Blocks • The main strategy is to place data blocks to give the • Data blocks belonging to the same file should preferably best possible locality. be placed in the same cylinder group at rotationally optimal distance. • Data blocks in a single file should preferably be placed in the same cylinder group at rotationally optimal • If the file grows bigger than 48 Kbyte the block allocation distance. is redirected to another cylinder group. Thereafter redirection is done for every Mbyte allocated data. • However, not everything can be placed locally, because a big file could fill up an entire cylinder group and make • The new cylinder groups are chosen among cylinder it impossible to find blocks at wanted location in the groups with more than average number of free blocks. future. • In order for the locality strategy to work, there should always be some free blocks in every cylinder group. • Unrelated data should be placed in a way that gives an equal amount of free space in all cylinder groups. 11 12

  7. Strategies for Placement Inodes Global and Local Allocation Routines Directory inodes • There are two levels of block allocation routines. • The global allocation routines keeps information about • A new directory is placed in a cylinder group which have the number of free blocks and inodes in the different more free inodes than average and as few directories as cylinder groups. possible. • They are used for example to locate the cylinder group File inodes with the maximum number of free blocks. • The local allocation routines use the bitmap in the • The inodes for all files in a directory should if possible cylinder group to allocate a specific block. be placed in the same cylinder group. • A reason for this is the commonly used command “ls -l” that have to read all file inodes in the directory. 13 14

  8. Local Allocation Routines The Fast Berkeley Filesystem When calling the local allocation routines it may happen Performance evaluation that the requested block is already in use. • Both read and write operations are faster in the new file If the requested block is not available, the following strategy system. is used: • The transfer speed in the new filesystem do not change with time (if at least 10 percent free space is 1. Use the next available block rotationally closest to the maintained). requested block on the same cylinder. • In the new filesystem read operations are always as fast 2. If there are no blocks available in the same cylinder, use or (usually) faster than write operations. The reason is a block in the same cylinder group. that the write operations run the block allocation 3. If the cylinder group is full, quadratically rehash the routines. cylinder group number to get a new cylinder group. • In the old filesystem write operations were about 50 4. Finally if the hash fails, apply an exhaustive search to all percent faster than read operations. This is because cylinder groups. write operations are asynchronous and the disk driver File systems that are parameterized to maintain at least 10 uses a SCAN algorithm to sort them. However then the percent free space rarely use strategies 3 and 4. file is read the read operations must always be processed immediately. • Read operations are synchronous also in the new file system, but here the blocks are better ordered at the disk. 15 16

Recommend


More recommend