hydrafs
play

HydraFS: a High-Throughput File System for the HYDRAstor - PowerPoint PPT Presentation

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki, Aniruddha Bohra Feb 26, 2010


  1. HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki, Aniruddha Bohra Feb 26, 2010

  2. HYDRAstor: De-duplicated Scalable Storage FAST’10 • HydraFS: a High Throughput Filesystem  Scale-out storage • Bimodal CDC for Backup Streams  With global de-duplication  Using Content-Defined Chunking • Standard protocols  Resilient to multiple failures • Chunking  Easy to manage (self- healing,…) • High throughput  High throughput for streaming access Access Layer  Std. interfaces (NFS/CIFS, VTL,…) Content-addressable API • Scalable • Easy to manage FAST’09 • Resilient HYDRAstor: a Scalable Secondary Storage • High throughput Content-addressable Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 2

  3. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks B1 File System Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 3

  4. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store File System CA 1 B1 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 4

  5. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store B2 File System CA 1 B1 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 5

  6. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store File System CA 1 CA 2 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 6

  7. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store B3 File System CA 1 CA 2 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 7

  8. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store  Duplicates eliminated by store File System CA 1 CA 1 CA 2 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 8

  9. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 1  Duplicates eliminated by store File System CA 1 CA 2 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 9

  10. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 1 CA 2  Duplicates eliminated by store File System CA 1 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 10

  11. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 1 CA 2 CA 1  Duplicates eliminated by store File System B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 11

  12. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store B4 CA 1 CA 2 CA 1  Duplicates eliminated by store File System  Configurable block resilience B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 12

  13. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store  Duplicates eliminated by store File System  Configurable block resilience CA 3 B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 13

  14. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 3  Duplicates eliminated by store File System  Configurable block resilience B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 14

  15. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store Root 1 CA 3  Duplicates eliminated by store File System  Configurable block resilience  Garbage collection B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 15

  16. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store  Duplicates eliminated by store File System  Configurable block resilience  Garbage collection Root 1 CA 3 B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 16

  17. Outline  HYDRAstor content-addressable API  Challenges posed to the filesystem  Filesystem architecture  Techniques used to overcome the challenges  Conclusions and future work FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 17

  18. Challenges  Content-addressable blocks – A change in a block’s contents also changes the block’s address • All metadata has to change, recursively up to the filesystem root • Parent can only be written after the children writes are successful  Variable-sized chunking (splitting file data into blocks) – Block boundaries change when content is changed – Overwrites cause read-rechunk-rewrite  High-latency block store operations – Why? Hashing, compression, erasure coding, fragment distribution … – Exacerbates the above two challenges FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 18

  19. Persistent Layout Filesystem superblock (root block) Inode map root Inode map B-tree Inode map (segmented array) Directory inode File inode Inode B-tree Directory B-tree Directory contents File contents FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 19

  20. HydraFS Architecture User Control messages operations File Server Commit Server Filesystem File System TS=1 TS=20 Block Store Root Metadata Data … TS=3; … TS=2; … Update log TS=1; op 1 , op 2 ,... FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 20

  21. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order Write Buffer (dirty data) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 21

  22. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order  Chunker – Decides block boundaries (based on data content) Write Buffer (dirty data) Chunker FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 22

  23. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order  Chunker – Decides block boundaries (based on data content)  Metadata modification records (file, directory, inode map) – Dirty metadata annotated with time-stamp (for cleaning) – Written out to log  Requires efficient cleaning – Large amount of dirty metadata!   Resource management issues Write Buffer Metadata Modification Records • File offset_range  CA • Directory additions/removals (dirty data) • Inode map de/allocations Chunker (dirty metadata) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 23

  24. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order  Chunker – Decides block boundaries (based on data content)  Metadata modification records (file, directory, inode map) – Dirty metadata annotated with time-stamp (for cleaning) – Written out to log  Block cache – Clean data and metadata (not de-serialized) Write Buffer Metadata Modification Records Block Cache • File offset_range  CA • CA  block data • Directory additions/removals (dirty data) • Inode map de/allocations (clean data & Chunker metadata) (dirty metadata) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 24

  25. Write Processing Write Buffer Metadata Modification Records Block Cache Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 25

  26. Write Processing [0,8 KB) Write Buffer Metadata Modification Records Block Cache Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 26

  27. Write Processing Write Buffer Metadata Modification Records Block Cache [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 27

  28. Write Processing [8 KB,16 KB) Write Buffer Metadata Modification Records Block Cache [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 28

  29. Write Processing Write Buffer Metadata Modification Records Block Cache [8 KB,16 KB) [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 29

  30. Write Processing Write Buffer Metadata Modification Records Block Cache [12 KB, 16 KB) Chunker 12 KB of data Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 30

  31. Write Processing Write Buffer Metadata Modification Records Block Cache [12 KB, 16 KB) Chunker CA 1 Data blocks 12 KB of data Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 31

Recommend


More recommend