emulating goliath storage
play

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC - PowerPoint PPT Presentation

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1 The Storage Researchers Dilemma Innovate Create the future of storage Measure


  1. Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1

  2. The Storage Researchers’ Dilemma Innovate Create the future of storage Measure Quantify improvement obtained Dilemma How to measure future of storage with devices from present?

  3. David: A Storage Emulator Large, fast, multiple disks using small, slow, single device Huge Disks ~1TB disk using 80 GB disk Multiple Disks RAID of multiple disks using RAM

  4. Key Idea behind David Store metadata, throw away data (and generate fake data) Why is this OK ? Benchmarks measure performance M any benchmarks don’t care about file content Some expect valid but not exact content

  5. Outline Intro Overview Design Results Conclusion

  6. Overview of how David works Benchmark Userspace Kernelspace Filesystem DAVID (Pseudo Block Device Driver) Storage Model Backing Store

  7. Illustrative Benchmark Create a File Write a block of data Close the File Open file in read mode Read back the data Close the File

  8. How does David handle metadata write? Benchmark F = fopen(“a.txt”,”w”); Allocate Inode in Filesystem block 100 Storage Model Backing Store

  9. How does David handle metadata write? Benchmark Filesystem Inode block 100 LBA : 100 Storage Model Backing Store

  10. How does David handle metadata write? Benchmark Filesystem 100 100 Storage Model Backing Store

  11. How does David handle metadata write? Benchmark Filesystem Model calculates Metadata block at response time for LBA 100 is remapped write to LBA 100 to LBA 1 100 1 Remap Table 100 1 Storage Model Backing Store

  12. How does David handle metadata write? Benchmark Filesystem Response to FS after 6 ms 100 1 Remap Table 100 1 Storage Model Backing Store

  13. How does David handle data write? Benchmark fwrite(buffer, 4096,1,F); Filesystem Data block 800 LBA : 800 1 Remap Table 100 1 Storage Model Backing Store

  14. How does David handle data write? Benchmark Filesystem 800 800 1 Remap Table 100 1 Storage Model Backing Store

  15. How does David handle data write? Benchmark Filesystem Model calculates response time for Data block at LBA 800 write to LBA 800 is THROWN AWAY 800 1 Remap Table 100 1 Storage Model Backing Store

  16. How does David handle data write? Benchmark Filesystem Response to FS after 8 ms Space Savings 50% 800 1 Remap Table 100 1 Storage Model Backing Store

  17. How does David handle metadata read? Benchmark F = fclose(F); F = fopen(“a.txt”,”r”); Filesystem 1 Remap Table 100 1 Storage Model Backing Store

  18. How does David handle metadata read? Benchmark Filesystem Inode block 100 LBA : 100 1 Remap Table 100 1 Storage Model Backing Store

  19. How does David handle metadata read? Benchmark Filesystem 100 100 1 Remap Table 100 1 Storage Model Backing Store

  20. How does David handle metadata read? Benchmark Filesystem Model calculates response time for Block at LBA 1 is read read to LBA 100 and returned. 100 1 1 Remap Table 100 1 Storage Model Backing Store

  21. How does David handle metadata read? Benchmark Filesystem Response to FS after 3 ms 100 100 1 1 Remap Table 100 1 Storage Model Backing Store

  22. How does David handle data read? Benchmark fread(buffer, 4096,1,F); Filesystem Data block 800 LBA : 800 1 Remap Table 100 1 Storage Model Backing Store

  23. How does David handle data read? Benchmark Filesystem 800 800 1 Remap Table 100 1 Storage Model Backing Store

  24. How does David handle data read? Benchmark Filesystem Model calculates Data block at LBA 800 response time for is filled with fake read to LBA 800 content 800 800 1 Remap Table 100 1 Backing Store Storage Model

  25. How does David handle data read? Benchmark Filesystem Response to FS after 8 ms 800 1 Remap Table 100 1 Backing Store Storage Model

  26. Outline Intro Overview Design Results Conclusion

  27. Design Goals for David Accurate Emulated disk should perform similar to real disk Scalable Should be able to emulate large disks Lightweight Emulation overhead should not affect accuracy Flexible Should be able to emulate variety of storage disks Adoptable Easy to install and use for benchmarking

  28. Components within David Block Classifier Data Data Metadata Storage Generator Squasher Remapper Model Backing Store

  29. Block Classification Data or Metadata? Distinguish data blocks from metadata blocks to throw away data blocks Why difficult? David is a block-level emulator Two Approaches Implicit Block Explicit Block Classification Classification (David automatically (Operating System infers block passes down block classification) classification)

  30. Implicit Block Classification Parse metadata writes using filesystem knowledge to infer data blocks Implementation for ext3 • Identify inode blocks using ext3 block layout • Parse inode blocks to infer direct/indirect blocks • Parse direct/indirect blocks to infer data blocks Problem Delay in classification

  31. Ext3 Ordered Journaling Mode (without David) M D Journal Disk

  32. Ext3 Ordered Journaling Mode (with David) Unclassified Block Store Journal Disk

  33. Memory Pressure in Unclassified Block Store Too many unclassified blocks exhaust memory Technique: Journal Snooping Parse metadata writes to journal to infer classification much earlier than usual

  34. Effect of Journal Snooping Without Journal Snooping With Journal Snooping Out of Memory 2000 Memory Used 1500 (MB) 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time (seconds)

  35. Block Classification Data or Metadata? Distinguish data blocks from metadata blocks to throw away data blocks Why difficult? David is a block-level emulator Two Approaches Explicit Block Implicit Block Classification Classification (Operating System (David automatically passes down block infers block classification) classification)

  36. Explicit Block Classification Benchmark Application Data Blocks Metadata Blocks FileSystem To David Capture page pointers to data blocks in the write system call and pass classification information to David

  37. Block Classification Summary Implicit Block Explicit Block Classification Classification No change to Minimal change to filesystem, benchmark operating system or operating system Requires filesystem Works for all knowledge filesystems Results with ext3 Results with btrfs

  38. Components within David Block Classifier Data Data Metadata Storage Generator Squasher Remapper Model Backing Store

  39. David’s Storage Model Emulated System Actual System Benchmark Benchmark Filesystem Filesystem I/O request Storage queue David Model Disk

  40. I/O Queue Model Merge sequential I/O requests • To improve performance When I/O queue is empty • Wait for 3 ms anticipating merges When I/O queue is full • Process is made to sleep and wait • Process is woken up once empty slots open up • Process is given a bonus for the wait period I/O queue modeling critical for accuracy

  41. Disk Model Simple in-kernel disk model • Based on Ruemmler and Wilkes disk model • Current models: 80GB and 1 TB Hitachi deskstar • Focus of our work is not disk modeling (more accurate models are possible) Disk model parameters • Disk properties Rotational speed, head seek profile, etc. • Current disk state Head position, on-disk cache state, etc.

  42. David’s Storage Model Accuracy Reasonable accuracy across many workloads Many more results in paper

  43. Components within David Block Classifier Data Data Metadata Storage Generator Squasher Remapper Model Backing Store

  44. Backing Store Storage space for metadata blocks Any physical storage can be used • Must be large enough to hold all metadata blocks • Must be fast enough to match emulated disk Two implementations • Memory as backing store • Compressed disk as backing store

  45. Metadata Remapper Remaps metadata blocks into compressed form Emulated Disk Inode Data Inode Data Inode Data Inode Inode Inode Compressed Disk (better performance)

  46. Components within David Block Classifier Data Data Metadata Storage Generator Squasher Remapper Model Backing Store

  47. Data Squasher and Generator Data Squasher Throws away writes to data blocks Data Generator Generate content for the reads to data blocks (currently generates random content)

  48. Outline Intro Overview Design Results Conclusion

  49. Experiments Emulation accuracy Test emulation accuracy across benchmarks Emulation scalability Test space savings for large device emulation Multiple disk emulation Test accuracy of multiple device emulation

  50. Emulation Accuracy Experiment Experimental details Emulated ~1 TB disk with 80 GB disk Ran a variety of benchmarks Validated by using a real 1 TB disk

  51. Emulation Accuracy Results (Ext3 with Implicit Block Classification) 400 Runtime (seconds) 350 300 250 200 Real 150 100 Emulated 50 0

  52. Emulation Accuracy Results (Btrfs with Explicit Block Classification) 350 Runtime (seconds) 300 250 200 150 Real 100 Emulated 50 0

Recommend


More recommend