Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1 , Andy Pavlo 1 , Sudipta Sengupta 2 Jin Li 2 , Greg Ganger 1 Carnegie Mellon University 1 , Microsoft Research 2
Document-oriented Databases { { " _id " : "55ca4cf7bad4f75b8eb5c25c", " _id " : "55ca4cf7bad4f75b8eb5c25d”, " pageId " : "46780", " pageId " : "46780", " revId " : " 41173 ", " revId " : "128520", " timestamp " : "2002-03-30T20:06:22", " timestamp " : "2002-03-30T20:11:12", Update " sha1 " : "6i81h1zt22u1w4sfxoofyzmxd” " sha1 " : "q08x58kbjmyljj4bow3e903uz” " text " : “The Peer and the Peri is a " text " : "The Peer and the Peri is a comic [[Gilbert and Sullivan]] comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as [[operetta ]] in two acts… just as predicting,…The fairy Queen, however, predicted, …The fairy Queen, on the other appears to … all live happily ever after. " hand, is ''not'' happy, and appears to … all } live happily ever after. " } Update: Reading a recent doc and writing back a similar one 2 ¡
Replication Bandwidth { { Primary " _id " : "55ca4cf7bad4f75b8eb5c25c", " _id " : "55ca4cf7bad4f75b8eb5c25d”, " pageId " : "46780", " pageId " : "46780", Database " revId " : " 41173 ", " revId " : "128520", " timestamp " : " 2002-‑03-‑30T20:06:22Z ", " timestamp " : "2002-03-30T20:11:12Z", " sha1 " : "6i81h1zt22u1w4sfxoofyzmxd” Goal: Reduce bandwidth " sha1 " : "q08x58kbjmyljj4bow3e903uz” Operation Operation " text " : "The Peer and the Peri” is a " text " : "The Peer and the Peri” is a logs logs comic [[Gilbert and Sullivan]] comic [[Gilbert and Sullivan]] WAN for WAN geo-replication [[operetta ]] in two acts… just as [[operetta ]] in two acts… just as predicting,…The fairy Queen, however, predicted, …The fairy Queen, on the other appears to … all live happily ever after. " hand, is ''not'' happy, and appears to … all } live happily ever after. " } Secondary Secondary 3 ¡
Why Deduplication? • Why not just compress ? – Oplog batches are small and not enough overlap • Why not just use diff ? – Need application guidance to identify source • Dedup finds and removes redundancies – In the entire data corpus 4 ¡
Traditional Dedup: Ideal Chunk Boundary Modified Region Duplicate Region Incoming {BYTE STREAM } Data 1 2 3 4 5 Send dedup’ed Deduped data to replicas 1 2 4 5 Data 5 ¡
Traditional Dedup: Reality Chunk Boundary Modified Region Duplicate Region Incoming Data 1 2 3 4 5 Send almost the Deduped 4 entire document. Data 6 ¡
Similarity Dedup Chunk Boundary Modified Region Duplicate Region Incoming Data Delta! Only send delta Dedup’ed encoding. Data 7 ¡
Compress vs. Dedup 20GB sampled Wikipedia dataset MongoDB v2.7 // 4MB Oplog batches 8 ¡
sDedup: Similarity Dedup Client Insertion & Updates Oplog Oplog syncer Unsynchronized Source Source oplog entries documents documents sDedup sDedup Encoder Decoder Database Database Re-constructed oplog entries Source Dedup’ed Replay Document Oplog oplog entries Cache Primary Node Secondary Node 9 ¡
sDedup Encoding Steps • Identify Similar Documents • Select the Best Match • Delta Compression 10 ¡
Identify Similar Documents Target Document Similarity Candidate Documents Rabin Chunking Score 32 17 25 41 12 1 Doc #1 39 32 22 15 Doc #1 Doc #2 32 25 38 41 12 Consistent Sampling Doc #3 2 32 17 38 41 12 Similarity Sketch 41 32 Doc #2 Doc #2 32 25 38 41 12 2 Feature 32 Doc #3 32 17 38 41 12 Index Table Doc #3 41 11 ¡
Select the Best Match Initial Ranking Final Ranking Rank Candidates Score Rank Candidates Cached? Score Doc #2 Doc #3 1 2 1 Yes 4 Doc #3 Doc #1 1 2 1 Yes 3 Doc #1 Doc #2 2 1 2 No 2 Is doc cached? If yes, reward +2 Source Document Cache 12 ¡
Evaluation • MongoDB setup (v2.7) – 1 primary, 1 secondary node, 1 client – Node Config: 4 cores, 8GB RAM, 100GB HDD storage • Datasets: – Wikipedia dump (20GB out of ~12TB) – Additional datasets evaluated in the paper 13 ¡
Compression sDedup trad-dedup 50 Compression Ratio 38.9 38.4 40 26.3 30 20 15.2 9.9 9.1 10 4.6 2.3 0 4KB 1KB 256B 64B Chunk Size 20GB sampled Wikipedia dataset 14 ¡
Memory sDedup trad-dedup 780.5 800 Memory (MB) 600 400 272.5 200 133.0 80.2 57.3 61.0 47.9 34.1 0 4KB 1KB 256B 64B Chunk Size 20GB sampled Wikipedia dataset 15 ¡
Other Results (See Paper) • Negligible client performance overhead • Failure recovery is quick and easy • Sharding does not hurt compression rate • More datasets – Microsoft Exchange, Stack Exchange 16 ¡
Conclusion & Future Work • sDedup : Similarity-based deduplication for replicated document databases. – Much greater data reduction than traditional dedup – Up to 38x compression ratio for Wikipedia – Resource-efficient design with negligible overhead • Future work – More diverse datasets – Dedup for local database storage – Different similarity search schemes (e.g., super-fingerprints) 17 ¡
Backup Slides 18 ¡
Compression: StackExchange sDedup trad-dedup 5 Compression Ratio 4 3 1.8 2 1.3 1.2 1.2 1.1 1.0 1.0 1.0 1 0 4KB 1KB 256B 64B Chunk Size 10GB sampled StackExchange dataset 19 ¡
Memory: StackExchange sDedup trad-dedup 3500 3,082.5 3000 Memory (MB) 2500 2000 1500 899.2 1000 439.8 414.3 302.0 500 228.4 115.4 83.9 0 4KB 1KB 256B 64B Chunk Size 10GB sampled StackExchange dataset 20 ¡
Throughput Overhead 21 ¡
Failure Recovery Failure Point 20GB sampled Wikipedia dataset. 22 ¡
Dedup + Sharding 50 Compression Ratio 38.4 38.2 38.1 37.9 40 30 20 10 0 1 3 5 9 Number of Shards 20GB sampled Wikipedia dataset 23 ¡
Delta Compression • Byte-level diff between source and target docs: – Based on the xDelta algorithm – Improved speed with minimal loss of compression • Encoding: – Descriptors about duplicate/unique regions + unique bytes • Decoding: – Use source doc + encoded output – Concatenate byte regions in order 24 ¡
Recommend
More recommend