u nusual d isk o ptimization t echniques

U NUSUAL D ISK O PTIMIZATION T ECHNIQUES Andrew Kane University of - PowerPoint PPT Presentation

U NUSUAL D ISK O PTIMIZATION T ECHNIQUES Andrew Kane University of Waterloo - PhD Candidate arkane@cs.uwaterloo.ca October 28 th 2009 1. M OTIVATION Disk I/O is a scarce resource and often a bottleneck Optimization Types: Disk

  1. U NUSUAL D ISK O PTIMIZATION T ECHNIQUES Andrew Kane University of Waterloo - PhD Candidate – arkane@cs.uwaterloo.ca October 28 th 2009

  2. 1. M OTIVATION  Disk I/O is a scarce resource and often a bottleneck  Optimization Types:  Disk Efficiency (Usage Rate)  Low Latency Writes (Logging) or Reads (Cache)  Workload Smoothing (prefetching, speculative 2 execution) http://blogs.msdn.com/e7/archive/2009/01/25/disk-defragmentation-background-and-engineering-the-windows-7- improvements.aspx

  3. O UTLINE OF T ALK  1. Motivation  2. History  3. Modern I/O Stack  File Systems: Traditional, Journaling, Log-structured  4. Common Optimization Techniques  5. Unusual Optimization Techniques  5.2 Freeblock scheduling  5.3 Eager writing  5.4 Low Latency Write-Ahead Log  5.5 Virtual logs  5.6 Dual-actuator disks  5.7 Track-based logging  6. Conclusions 3

  4. 2. H ISTORY 2.1 M AGNETIC D RUM M EMORY Widely used in the 1950s & 60s as the main working memory. Above left: A 16-inch-long drum from the IBM 650 computer, with 4 40 tracks, 1 head per track, 10 kB of storage space, and 12,500 RPM.

  5. 2. H ISTORY 2.1 M AGNETIC D RUM M EMORY  Acting as main memory means CPU is waiting for reads => we need low latency  Stride operations on the drum so that the next operation is under the read head when the CPU needs it  Fixed heads so no seek time  This is memory, but random access is not a fixed cost 5

  6. 2. H ISTORY 2.2 H ARD D ISK D RIVES The first hard disk drive was the IBM Model 350 Disk File in 1956. 6 It had 50 24-inch discs with a total storage capacity of 5 MB.

  7. 2. H ISTORY 2.2 H ARD D ISK D RIVES  Movable heads  Seek and rotational latency  So, don’t use this for main memory  Read by block and cache results in memory so the disk is not part of the CPU execution cycle  Much larger storage sizes  Combine Drum and Hard Disk… 7

  8. 2. H ISTORY 2.3 C OMBINE F IXED & M OVABLE H EADS  Fixed and moving heads within hard disk  IBM/VS 1.3 writes to Write Ahead Data Set (WADS) (1982).  One forced write to each track of the fixed head portion, means write where head is currently located  In parallel, block writes of all data to the movable head portion  Reads handled by disk cache and movable head portion [1] Strickland, J. P., Uhrowczik, P. P., Watts, V. L. IMS/VS: An evolving system. IBM System Journal , 21, 4 8 (1982). [2] Peterson, R. J., Strickland, J. P. Log write-ahead protocols and IMS/VS logging. In Proceedings 2nd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Atlanta, Ga., March 1983). [3] US Patent 4507751 - Method and apparatus for logging journal data using a log write ahead data set. 1985.

  9. 3. M ODERN I/O S TACK Application Cache Write Read/ FS API OS / File System Cache Flush Read/Write LBA Embedded Controller Cache through Write- Disk Drive Physical Media 9 [4] Farley, M. Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and Filing Systems. Chapter 4. Cisco Press , 2004.

  10. 3. M ODERN I/O S TACK 3.1 D ISK D RIVE 10

  11. 3. M ODERN I/O S TACK 3.1 D ISK D RIVE  Access physical media via (Cylinder, Track, Sector) = CTS  Remap damaged sectors  Costs: seek (2-6 ms, minimum 0.6 ms), rotational (4-8 ms), head switch, transfer latencies + queuing delay  Seek cost varies non-linearly  Cache for reading and writing  Up to 30 second delay before write to cache is executed on the physical media  Reorder operations to reduce latencies  Zoned-bit recording varies density on tracks  Fastest throughput for outermost tracks  Partitions are assigned from outermost track inwards 11 [4] Farley, M. Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and Filing Systems. Chapter 4. Cisco Press , 2004.

  12. 3. M ODERN I/O S TACK 3.2 F ILE S YSTEM I NTERFACE  The file system keeps track of files organized into a directory structure  Traditionally for one disk partition  Metadata (file structure, data location and other information) + data (what’s in the file)  Deals with the disk drive via Logical Block Addressing (LBA), a single flat address space of blocks  This makes optimizations harder at this level  Allows the disk to do its own optimizations 12  Allows the disk to be more reliable via remapping [4] Farley, M. Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and Filing Systems. Chapter 4. Cisco Press , 2004.

  13. 3. M ODERN I/O S TACK 3.3 T RADITIONAL F ILE S YSTEMS  Idea: Store metadata in tree of directory nodes and inodes where leaves are blocks of data for the files  Try to sequentially allocate blocks to a file so that reading is faster  Writes to existing blocks of a file are executed to that exact location on disk  Delayed writes can cause corruption on failure  Example: ext2 13 [5] McKusick, M. K., Joy, W. N., Leffler, S. J., Fabry, R. S. A fast file system for UNIX. ACM Transactions on Computer Systems (TOCS) , v.2 n.3, p.181-197, Aug. 1984.

  14. 3. M ODERN I/O S TACK 3.3 T RADITIONAL F ILE S YSTEMS 14 http://www.zimbio.com/Linux/articles/738/Part+II+Object+File+Systems+Legacy+Unix+Linux

  15. 3. M ODERN I/O S TACK 3.4 J OURNALING F ILE S YSTEMS  Idea: Add a journal (log) of changes that you are going to make to the files system before you make them  Better recovery and fault tolerance  Reads use the normal file system  Writes happen twice (journal + normal file system), but the journal is sequential and batched for group commit  Could journal only the metadata (common) which is small  Example: ext3 15 [6] Tweedie, S. C. Journaling the Linux ext2fs File System. In the Fourth Annual Linux Expo , Durham, North Carolina, May 1998.

  16. 3. M ODERN I/O S TACK 3.4 J OURNALING F ILE S YSTEMS 16 http://www.ibm.com/developerworks/linux/library/l-journaling-filesystems/

  17. 3. M ODERN I/O S TACK 3.5 L OG -S TRUCTURED F ILE S YSTEMS  Idea: Treat the entire disk as one log and put writes to files at the end of the log  Need cleanup and compaction to allow the log to loop around  Fast writes because of batching and group commit to end of log  Fragmentation of file on read (cache may solve this) 17 [7] Rosenblum, M. and Ousterhout, J. K. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 10, 1 (1992), 26-52.

  18. 3. M ODERN I/O S TACK 3.5 L OG -S TRUCTURED F ILE S YSTEMS Normal File System Log-Structured File System 18 http://www.outflux.net/projects/lfs/what_lfs_is.html

  19. 4. C OMMON O PTIMIZATION T ECHNIQUES  Caching reads Removes or postpones lots of issues with fragmentation  Do different levels of cache work well together?   Reorder operations  Prefetching  Replicas of data (even on a single disk)  Buffering/batching writes Potential data loss on failure  If writes are transactional, then you’re trading latency for throughput   Short-stroking disk Use only the outer tracks of the disk to reduce seek time  Align with zoned-bit recording increases throughput  Usually implemented using partitions   Use non-volatile memory (most common is flash) Solid state drives (SSD)  Hybrid drives = flash + hard disk   Use multiple disks 19 NAS/SAN/RAID includes extra cache memory  [8] Hsu, W. and Smith, A. J. The performance impact of I/O optimizations and disk improvements. IBM Journal of Research and Development, March 2004, Volume 48, Issue 2, 255-289.

  20. 5. U NUSUAL O PTIMIZATION T ECHNIQUES 5.1 M ODELING THE D ISK IN S OFTWARE  Need to know how the disk is laid out  Go from LBA to CTS addressing  Include remapping of sectors  Need to know where the disk head is located  Can be done in software  When return from new read/write you know where the head is (+ processing time)  Keep this accurate by issuing new reads/writes as needed  Model scheduling algorithm  Predict order of execution of operations sent to the disk 20

  21. 5. U NUSUAL O PTIMIZATION T ECHNIQUES 5.2 F REEBLOCK S CHEDULING  Idea: Replace a disk drive’s rotational latency delays with useful background media transfers [9] Lumb, C. R., Schindler, J., Ganger, G. R., Nagle, D. F. and Riedel, E. Towards higher disk head utilization: 21 extracting free bandwidth from busy disk drives . In Anonymous OSDI'00: Proceedings of the 4th Conference on Operating System Design & Implementation . (San Diego, California), 87-102. 2000. [10] Lumb, C. R., Schindler, J., Ganger, G. R. Freeblock Scheduling Outside of Disk Firmware . In Proceedings of the First USENIX Conferenceon on File and Storage Technologies (FAST’02) , Monterey, CA, January 2002.

  22. 5. U NUSUAL O PTIMIZATION T ECHNIQUES 5.2 F REEBLOCK S CHEDULING  Applications  Segment cleaning (e.g. LFS)  Data mining (e.g. indexing for search)  In firmware (OSDI 2000)  20-50% of disk’s bandwidth can be provided to background applications  47 full disk scans per day on an active 9 GB disk (last 5% takes 30% of the time)  In software (FAST 2002)  15% of disks potential bandwidth can be provided to background applications  37 full disk scans per day on active 9 GB disk 22


More recommend