HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, M. Welnicki C. Ungureanu
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 2 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Challenges ● High-performance, decentralized global deduplication ... in a dynamic, distributed system ... with deletion and failures ● Combination introduces complexity ● Tension between: ● Deduplication and dynamic scalability ● Deduplication and on-demand deletion ● Failure tolerance and deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 ● Satisfies Scalable secondary storage requirements ● Started as a research project at NEC Laboratories America, in Princeton, NJ ● Successfully commercialized ● Today: real-world, commercial system ● Sold by NEC in the US and Japan ● Development of back-end continues at 9LivesData, LLC in Warsaw, Poland ● Spinoff from NEC Laboratories
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 HYDRAstor functionality ● Content addressable storage (CAS) ● Vast data repository ● Storing and extracting streams of blocks ● Single system image built of independent nodes ● Support for standard access methods ● Filesystem, VTL ● Dynamic capacity sharing ● Self-recovery from failures ● On-demand deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10 Programming Model ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11 Programming Model ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized ● Exposed pointers to other blocks E 011..0 hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12 Programming Model ● Repository of blocks hash=010..1 Root1 E ● Content-addressed ● Immutable ● Variable-sized ● Exposed pointers to other E E blocks ● Trees of blocks E 011..0 hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17 Programming Model ● Repository of blocks Root2 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18 Architecture overview ● Standard server-grade hardware running Linux ● Scalability on data-center level NFS / CIFS Front-end Access Nodes Internal Network Back-end (CAS Layer) Storage Nodes
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion
Recommend
More recommend