HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki, Aniruddha Bohra Feb 26, 2010
HYDRAstor: De-duplicated Scalable Storage FAST’10 • HydraFS: a High Throughput Filesystem Scale-out storage • Bimodal CDC for Backup Streams With global de-duplication Using Content-Defined Chunking • Standard protocols Resilient to multiple failures • Chunking Easy to manage (self- healing,…) • High throughput High throughput for streaming access Access Layer Std. interfaces (NFS/CIFS, VTL,…) Content-addressable API • Scalable • Easy to manage FAST’09 • Resilient HYDRAstor: a Scalable Secondary Storage • High throughput Content-addressable Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 2
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks B1 File System Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 3
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store File System CA 1 B1 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 4
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store B2 File System CA 1 B1 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 5
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store File System CA 1 CA 2 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 6
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store B3 File System CA 1 CA 2 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 7
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store Duplicates eliminated by store File System CA 1 CA 1 CA 2 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 8
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store CA 1 Duplicates eliminated by store File System CA 1 CA 2 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 9
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store CA 1 CA 2 Duplicates eliminated by store File System CA 1 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 10
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store CA 1 CA 2 CA 1 Duplicates eliminated by store File System B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 11
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store B4 CA 1 CA 2 CA 1 Duplicates eliminated by store File System Configurable block resilience B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 12
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store Duplicates eliminated by store File System Configurable block resilience CA 3 B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 13
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store CA 3 Duplicates eliminated by store File System Configurable block resilience B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 14
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store Root 1 CA 3 Duplicates eliminated by store File System Configurable block resilience Garbage collection B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 15
HYDRAstor Usage Example Block Store (CAS) API Variable-size blocks Content-addressable Address decided by the store Duplicates eliminated by store File System Configurable block resilience Garbage collection Root 1 CA 3 B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 16
Outline HYDRAstor content-addressable API Challenges posed to the filesystem Filesystem architecture Techniques used to overcome the challenges Conclusions and future work FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 17
Challenges Content-addressable blocks – A change in a block’s contents also changes the block’s address • All metadata has to change, recursively up to the filesystem root • Parent can only be written after the children writes are successful Variable-sized chunking (splitting file data into blocks) – Block boundaries change when content is changed – Overwrites cause read-rechunk-rewrite High-latency block store operations – Why? Hashing, compression, erasure coding, fragment distribution … – Exacerbates the above two challenges FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 18
Persistent Layout Filesystem superblock (root block) Inode map root Inode map B-tree Inode map (segmented array) Directory inode File inode Inode B-tree Directory B-tree Directory contents File contents FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 19
HydraFS Architecture User Control messages operations File Server Commit Server Filesystem File System TS=1 TS=20 Block Store Root Metadata Data … TS=3; … TS=2; … Update log TS=1; op 1 , op 2 ,... FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 20
File Server Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order Write Buffer (dirty data) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 21
File Server Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order Chunker – Decides block boundaries (based on data content) Write Buffer (dirty data) Chunker FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 22
File Server Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order Chunker – Decides block boundaries (based on data content) Metadata modification records (file, directory, inode map) – Dirty metadata annotated with time-stamp (for cleaning) – Written out to log Requires efficient cleaning – Large amount of dirty metadata! Resource management issues Write Buffer Metadata Modification Records • File offset_range CA • Directory additions/removals (dirty data) • Inode map de/allocations Chunker (dirty metadata) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 23
File Server Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order Chunker – Decides block boundaries (based on data content) Metadata modification records (file, directory, inode map) – Dirty metadata annotated with time-stamp (for cleaning) – Written out to log Block cache – Clean data and metadata (not de-serialized) Write Buffer Metadata Modification Records Block Cache • File offset_range CA • CA block data • Directory additions/removals (dirty data) • Inode map de/allocations (clean data & Chunker metadata) (dirty metadata) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 24
Write Processing Write Buffer Metadata Modification Records Block Cache Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 25
Write Processing [0,8 KB) Write Buffer Metadata Modification Records Block Cache Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 26
Write Processing Write Buffer Metadata Modification Records Block Cache [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 27
Write Processing [8 KB,16 KB) Write Buffer Metadata Modification Records Block Cache [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 28
Write Processing Write Buffer Metadata Modification Records Block Cache [8 KB,16 KB) [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 29
Write Processing Write Buffer Metadata Modification Records Block Cache [12 KB, 16 KB) Chunker 12 KB of data Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 30
Write Processing Write Buffer Metadata Modification Records Block Cache [12 KB, 16 KB) Chunker CA 1 Data blocks 12 KB of data Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 31
Recommend
More recommend