U NUSUAL D ISK O PTIMIZATION T ECHNIQUES Andrew Kane University of Waterloo - PhD Candidate – arkane@cs.uwaterloo.ca October 28 th 2009
1. M OTIVATION Disk I/O is a scarce resource and often a bottleneck Optimization Types: Disk Efficiency (Usage Rate) Low Latency Writes (Logging) or Reads (Cache) Workload Smoothing (prefetching, speculative 2 execution) http://blogs.msdn.com/e7/archive/2009/01/25/disk-defragmentation-background-and-engineering-the-windows-7- improvements.aspx
O UTLINE OF T ALK 1. Motivation 2. History 3. Modern I/O Stack File Systems: Traditional, Journaling, Log-structured 4. Common Optimization Techniques 5. Unusual Optimization Techniques 5.2 Freeblock scheduling 5.3 Eager writing 5.4 Low Latency Write-Ahead Log 5.5 Virtual logs 5.6 Dual-actuator disks 5.7 Track-based logging 6. Conclusions 3
2. H ISTORY 2.1 M AGNETIC D RUM M EMORY Widely used in the 1950s & 60s as the main working memory. Above left: A 16-inch-long drum from the IBM 650 computer, with 4 40 tracks, 1 head per track, 10 kB of storage space, and 12,500 RPM.
2. H ISTORY 2.1 M AGNETIC D RUM M EMORY Acting as main memory means CPU is waiting for reads => we need low latency Stride operations on the drum so that the next operation is under the read head when the CPU needs it Fixed heads so no seek time This is memory, but random access is not a fixed cost 5
2. H ISTORY 2.2 H ARD D ISK D RIVES The first hard disk drive was the IBM Model 350 Disk File in 1956. 6 It had 50 24-inch discs with a total storage capacity of 5 MB.
2. H ISTORY 2.2 H ARD D ISK D RIVES Movable heads Seek and rotational latency So, don’t use this for main memory Read by block and cache results in memory so the disk is not part of the CPU execution cycle Much larger storage sizes Combine Drum and Hard Disk… 7
2. H ISTORY 2.3 C OMBINE F IXED & M OVABLE H EADS Fixed and moving heads within hard disk IBM/VS 1.3 writes to Write Ahead Data Set (WADS) (1982). One forced write to each track of the fixed head portion, means write where head is currently located In parallel, block writes of all data to the movable head portion Reads handled by disk cache and movable head portion [1] Strickland, J. P., Uhrowczik, P. P., Watts, V. L. IMS/VS: An evolving system. IBM System Journal , 21, 4 8 (1982). [2] Peterson, R. J., Strickland, J. P. Log write-ahead protocols and IMS/VS logging. In Proceedings 2nd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Atlanta, Ga., March 1983). [3] US Patent 4507751 - Method and apparatus for logging journal data using a log write ahead data set. 1985.
3. M ODERN I/O S TACK Application Cache Write Read/ FS API OS / File System Cache Flush Read/Write LBA Embedded Controller Cache through Write- Disk Drive Physical Media 9 [4] Farley, M. Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and Filing Systems. Chapter 4. Cisco Press , 2004.
3. M ODERN I/O S TACK 3.1 D ISK D RIVE 10
3. M ODERN I/O S TACK 3.1 D ISK D RIVE Access physical media via (Cylinder, Track, Sector) = CTS Remap damaged sectors Costs: seek (2-6 ms, minimum 0.6 ms), rotational (4-8 ms), head switch, transfer latencies + queuing delay Seek cost varies non-linearly Cache for reading and writing Up to 30 second delay before write to cache is executed on the physical media Reorder operations to reduce latencies Zoned-bit recording varies density on tracks Fastest throughput for outermost tracks Partitions are assigned from outermost track inwards 11 [4] Farley, M. Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and Filing Systems. Chapter 4. Cisco Press , 2004.
3. M ODERN I/O S TACK 3.2 F ILE S YSTEM I NTERFACE The file system keeps track of files organized into a directory structure Traditionally for one disk partition Metadata (file structure, data location and other information) + data (what’s in the file) Deals with the disk drive via Logical Block Addressing (LBA), a single flat address space of blocks This makes optimizations harder at this level Allows the disk to do its own optimizations 12 Allows the disk to be more reliable via remapping [4] Farley, M. Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and Filing Systems. Chapter 4. Cisco Press , 2004.
3. M ODERN I/O S TACK 3.3 T RADITIONAL F ILE S YSTEMS Idea: Store metadata in tree of directory nodes and inodes where leaves are blocks of data for the files Try to sequentially allocate blocks to a file so that reading is faster Writes to existing blocks of a file are executed to that exact location on disk Delayed writes can cause corruption on failure Example: ext2 13 [5] McKusick, M. K., Joy, W. N., Leffler, S. J., Fabry, R. S. A fast file system for UNIX. ACM Transactions on Computer Systems (TOCS) , v.2 n.3, p.181-197, Aug. 1984.
3. M ODERN I/O S TACK 3.3 T RADITIONAL F ILE S YSTEMS 14 http://www.zimbio.com/Linux/articles/738/Part+II+Object+File+Systems+Legacy+Unix+Linux
3. M ODERN I/O S TACK 3.4 J OURNALING F ILE S YSTEMS Idea: Add a journal (log) of changes that you are going to make to the files system before you make them Better recovery and fault tolerance Reads use the normal file system Writes happen twice (journal + normal file system), but the journal is sequential and batched for group commit Could journal only the metadata (common) which is small Example: ext3 15 [6] Tweedie, S. C. Journaling the Linux ext2fs File System. In the Fourth Annual Linux Expo , Durham, North Carolina, May 1998.
3. M ODERN I/O S TACK 3.4 J OURNALING F ILE S YSTEMS 16 http://www.ibm.com/developerworks/linux/library/l-journaling-filesystems/
3. M ODERN I/O S TACK 3.5 L OG -S TRUCTURED F ILE S YSTEMS Idea: Treat the entire disk as one log and put writes to files at the end of the log Need cleanup and compaction to allow the log to loop around Fast writes because of batching and group commit to end of log Fragmentation of file on read (cache may solve this) 17 [7] Rosenblum, M. and Ousterhout, J. K. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 10, 1 (1992), 26-52.
3. M ODERN I/O S TACK 3.5 L OG -S TRUCTURED F ILE S YSTEMS Normal File System Log-Structured File System 18 http://www.outflux.net/projects/lfs/what_lfs_is.html
4. C OMMON O PTIMIZATION T ECHNIQUES Caching reads Removes or postpones lots of issues with fragmentation Do different levels of cache work well together? Reorder operations Prefetching Replicas of data (even on a single disk) Buffering/batching writes Potential data loss on failure If writes are transactional, then you’re trading latency for throughput Short-stroking disk Use only the outer tracks of the disk to reduce seek time Align with zoned-bit recording increases throughput Usually implemented using partitions Use non-volatile memory (most common is flash) Solid state drives (SSD) Hybrid drives = flash + hard disk Use multiple disks 19 NAS/SAN/RAID includes extra cache memory [8] Hsu, W. and Smith, A. J. The performance impact of I/O optimizations and disk improvements. IBM Journal of Research and Development, March 2004, Volume 48, Issue 2, 255-289.
5. U NUSUAL O PTIMIZATION T ECHNIQUES 5.1 M ODELING THE D ISK IN S OFTWARE Need to know how the disk is laid out Go from LBA to CTS addressing Include remapping of sectors Need to know where the disk head is located Can be done in software When return from new read/write you know where the head is (+ processing time) Keep this accurate by issuing new reads/writes as needed Model scheduling algorithm Predict order of execution of operations sent to the disk 20
5. U NUSUAL O PTIMIZATION T ECHNIQUES 5.2 F REEBLOCK S CHEDULING Idea: Replace a disk drive’s rotational latency delays with useful background media transfers [9] Lumb, C. R., Schindler, J., Ganger, G. R., Nagle, D. F. and Riedel, E. Towards higher disk head utilization: 21 extracting free bandwidth from busy disk drives . In Anonymous OSDI'00: Proceedings of the 4th Conference on Operating System Design & Implementation . (San Diego, California), 87-102. 2000. [10] Lumb, C. R., Schindler, J., Ganger, G. R. Freeblock Scheduling Outside of Disk Firmware . In Proceedings of the First USENIX Conferenceon on File and Storage Technologies (FAST’02) , Monterey, CA, January 2002.
5. U NUSUAL O PTIMIZATION T ECHNIQUES 5.2 F REEBLOCK S CHEDULING Applications Segment cleaning (e.g. LFS) Data mining (e.g. indexing for search) In firmware (OSDI 2000) 20-50% of disk’s bandwidth can be provided to background applications 47 full disk scans per day on an active 9 GB disk (last 5% takes 30% of the time) In software (FAST 2002) 15% of disks potential bandwidth can be provided to background applications 37 full disk scans per day on active 9 GB disk 22
Recommend
More recommend