OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash Zhuan Chen and Kai Shen University of Rochester
Background } I/O deduplication } Eliminate I/O writes with redundant content } Reduce the storage space usage } Write reduction: reduce the Flash wear, improve performance } Broad usage in data centers, personal computers, data-driven sensing
Mo,va,on } I/O deduplication is not free: metadata maintenance } (1) Logical-physical block mapping (2) Physical block fingerprints (3) Physical block reference counts } Need to maintain failure-consistency for data and metadata … … L2P mapping Write: ABC L 1 P Write: ABC (ref. ctr. = 2) Write: ABC … L 2 … … Logical Blocks Physical Blocks (on Flash) 3
Challenge } Existing approaches for failure-consistency } Rely on non-volatile RAM or supercapacitors/batteries [Srinivasan etal. 2012; Chen et al. 2011; Gupta et al. 2011] } Checking/repair tools [Quinlan et al. 2002] } Redo logging [Meister et al. 2010] (additional I/O for logging writes) } Shadowing [Tarasov et al. 2014] (additional I/O for index block writes) } Challenge: metadata & failure-consistency-induced I/O cost shouldn’t significantly diminish the deduplication I/O saving } We look into soft updates-style I/O ordering 4
I/O Ordering for Failure-Consistency } Define an order for data/metadata writes } Ordered writes are committed one by one } A failure still keeps a deduplication system consistent } A failure can only leave garbage (which can be reclaimed asynchronously) Example: new write (duplicated content) Incr. P 1 ref. L2P mapping ✗ (Higher-than-actual ref. – leave garbage) 5
I/O Ordering for Failure-Consistency } I/O efficiency } No consistency-induced additional I/O } We can merge metadata writes residing on the same metadata block as long as they are not subject to any ordering constraint Example: new write (duplicated content) Incr. P 1 ref. L2P mapping Incr. P 2 ref. L2P mapping Metadata block 1 Metadata block 2 6
I/O Ordering for Failure-Consistency } Cyclic dependencies } Prevent metadata I/O merging & complicate the implementation } Make soft updates costly for file systems [Seltzer et al. 2000] Example: (1) overwrite (duplicated content) (2) new write (duplicated content) (1) Incr. P 1 ref. L2P mapping Decr. P 2 ref. (2) Incr. P 3 ref. L2P mapping Metadata block 1 Metadata block 2 Metadata block 1 Metadata block 2 7
I/O Ordering for Failure-Consistency } Resolve cyclic dependencies } We carefully design all deduplication I/O paths } Delay non-critical metadata I/O (the completion signal doesn’t depend on) Example: (1) overwrite (duplicated content) (2) new write (duplicated content) (1) Incr. P 1 ref. L2P mapping Completion to client Decr. P 2 ref. Delay (2) Incr. P 3 ref. L2P mapping Metadata block 1 Metadata block 2 8
I/O Ordering for Failure-Consistency 1. Write new block L; duplicating existing physical block P inc. P’s ref.ctr. map L to P completion to client 2. Write new block L; no duplicate write to new physical block P map L to P completion to client set P’s ref.ctr. to 2 add P’s fingerprint 3. Overwrite block L mapped to physical block P old ; duplicating physical block P dup dec. P old ’s ref. ctr. inc. P dup ’s map L to P dup ref. ctr. completion to client 4. Overwrite block L mapped to physical block P old ; no duplicate dec. P old ’s ref. ctr. write to physical block P new map L to P new completion to client set P new ’s ref. ctr. to 2 add P new ’s fingerprint 9
Metadata I/O Merging for Efficiency } Anticipatory I/O delay and merging } Delay a metadata write in anticipation for near-future merging opportunities } Limited delay duration (e.g., 1 millisecond), slight performance impact } We name our approach OrderMergeDedup 10
Evalua,on Setup } Prototype of OrderMergeDedup } A custom device mapper target of Linux 3.14.29 } Mobile system workloads (Atom-based tablet) } Ubuntu package update & installation } BBench web browsing } Vehicle counting for intelligent traffic sensing } Server system workloads (Xeon-based server machine) } Hadoop } YCSB/MongoDB 11
Evalua,on Deduplicated physical block writes Normalized I/O volume 1 Original Dmdedup 0.8 Failure − consistent 0.6 write ordering 0.4 + Anticipatory I/O 0.2 delay/merging 0 P P B V H Y a a B e a C c c h e d S k k i n o c B a a c o l g g e / h p M e e c w o i i o n n e n u d s g b n e t o a t b x i D n l l r u o g B p w d s a i n t e g } We save 18-63% I/O writes (on workloads with 23-73% write duplication) 12
Evalua,on (Strong Persistence Model) Deduplicated physical block writes Normalized I/O volume 12 × 12 × 5 × 10 × 11 × 11 × Original 2 Dmdedup 1.5 Failure − consistent write ordering 1 + Anticipatory I/O 0.5 delay/merging 0 P P B V H Y a a B e a C c c h e d S k k i n o c B a a c o l g g e / h p M e e c w o i i o n n e n u d s g b n e t o a t b x i D n l l r u o g B p w d s a i n t e g } We save 15-51% I/O writes (on workloads with 23-73% write duplication) } Anticipatory I/O delay/merging is particularly effective 13
Conclusion } OrderMergeDedup } Efficient, failure-consistent I/O deduplication on Flash } A soft updates-style data/metadata write ordering for failure-consistency (in particular, we resolve all possible cyclic dependencies with carefully designed I/O ordering and by delaying non-critical metadata writes) } Anticipatory I/O delay and merging to further reduce metadata I/O writes } We save 18-63% I/O writes (on workloads with 23-73% write duplication) } Anticipatory I/O delay/merging is particularly effective under the strong persistence model 14
Recommend
More recommend