Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1 , Andy Pavlo 1 , Sudipta Sengupta 2 Jin Li 2 , Greg Ganger 1 Carnegie Mellon University 1 , Microsoft Research 2
Document-oriented Databases { { " _id " : "55ca4cf7bad4f75b8eb5c25c", " _id " : "55ca4cf7bad4f75b8eb5c25d”, " pageId " : "46780", " pageId " : "46780", " revId " : " 41173 ", " revId " : "128520", " timestamp " : "2002-03-30T20:06:22", " timestamp " : "2002-03-30T20:11:12", Update " sha1 " : "6i81h1zt22u1w4sfxoofyzmxd” " sha1 " : "q08x58kbjmyljj4bow3e903uz” " text " : “The Peer and the Peri is a " text " : "The Peer and the Peri is a comic [[Gilbert and Sullivan]] comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as [[operetta ]] in two acts… just as predicting,…The fairy Queen, however, predicted, …The fairy Queen, on the other appears to … all live happily ever after. " hand, is ''not'' happy, and appears to … all } live happily ever after. " } Update: Reading a recent doc and writing back a similar one 2 ¡
Replication Bandwidth { { " _id " : "55ca4cf7bad4f75b8eb5c25d”, " _id " : "55ca4cf7bad4f75b8eb5c25c", " pageId " : "46780", " pageId " : "46780", " revId " : " 41173 ", " revId " : "128520", Primary " timestamp " : "2002-03-30T20:11:12", " timestamp " : "2002-03-30T20:06:22", " sha1 " : "q08x58kbjmyljj4bow3e903uz” " sha1 " : "6i81h1zt22u1w4sfxoofyzmxd” " text " : "The Peer and the Peri is a " text " : “The Peer and the Peri is a Database comic [[Gilbert and Sullivan]] comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as [[operetta ]] in two acts… just as predicted, …The fairy Queen, on the other predicting,…The fairy Queen, however, hand, is ''not'' happy, and appears to … all appears to … all live happily ever after. " live happily ever after. " } } Operation Operation logs logs WAN Secondary Secondary 3 ¡
Replication Bandwidth { { " _id " : "55ca4cf7bad4f75b8eb5c25d”, " _id " : "55ca4cf7bad4f75b8eb5c25c", " pageId " : "46780", " pageId " : "46780", " revId " : " 41173 ", " revId " : "128520", Primary " timestamp " : "2002-03-30T20:11:12", " timestamp " : "2002-03-30T20:06:22", " sha1 " : "q08x58kbjmyljj4bow3e903uz” " sha1 " : "6i81h1zt22u1w4sfxoofyzmxd” " text " : "The Peer and the Peri is a " text " : “The Peer and the Peri is a Database comic [[Gilbert and Sullivan]] comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as [[operetta ]] in two acts… just as predicted, …The fairy Queen, on the other predicting,…The fairy Queen, however, hand, is ''not'' happy, and appears to … all appears to … all live happily ever after. " live happily ever after. " } Goal: Reduce WAN bandwidth } Operation Operation logs logs WAN for geo-replication Secondary Secondary 4 ¡
Why Deduplication? • Why not just compr ompres ess? – Oplog batches are small and not enough overlap • Why not just use di di fg fg ? – Need application guidance to identify source • Dedup Dedup finds and removes redundancies – In the entire data corpus 5 ¡
Traditional Dedup: Ideal Chunk Boundary Modified Region Duplicate Region {BYTE STREAM } Incoming Data 1 2 3 4 5 Send dedup’ed Deduped data to replicas 1 2 4 5 Data 6 ¡
Traditional Dedup: Reality Chunk Boundary Modified Region Duplicate Region Incoming Data 1 2 3 4 5 Deduped 4 Data 7 ¡
Traditional Dedup: Reality Chunk Boundary Modified Region Duplicate Region Incoming Data 1 2 3 4 5 Send almost the Deduped 4 entire document. Data 8 ¡
Similarity Dedup (sDedup) Chunk Boundary Modified Region Duplicate Region Incoming Data Delta! Only send delta Dedup’ed encoding. Data 9 ¡
Compress vs. Dedup 20GB sampled Wikipedia dataset MongoDB v2.7 / / 4MB Oplog batches 10 ¡
sDedup Integration Client Insertion & Updates Oplog Oplog syncer Unsynchronized Source Source oplog entries documents documents sDedup sDedup Encoder Decoder Database Database Re-constructed oplog entries Source Dedup’ed Replay Document Oplog oplog entries Cache 11 ¡ Primary Node Secondary Node
sDedup Encoding Steps • Identify Similar Documents • Select the Best Match • Delta Compression 12 ¡
Identify Similar Documents Target Document Similarity Similarity Candida andidate Documents Documents Rabin Chunking Scor Sc ore 32 17 25 41 12 1 Doc #1 39 32 22 15 Doc #1 Consistent Doc #2 32 25 38 41 12 Sampling Doc #3 2 32 17 38 41 12 Similarity Sketch 41 32 Doc #2 Doc #2 32 25 38 41 12 2 32 Doc #3 Feature 32 17 38 41 12 Index Table Doc #3 41 13 ¡
Select the Best Match Initial Ranking Initial Ranking Final Ranking Final Ranking Rank Candidates Score Rank Candidates Cached? Score Doc #2 Doc #3 1 2 1 Yes es 4 Doc #3 Doc #1 1 2 1 Yes es 3 Doc #1 Doc #2 2 1 2 No 2 Is doc cached? If yes, reward +2 Source Document 14 ¡ Cache
Evaluation • MongoDB setup (v2.7) – 1 primary, 1 secondary node, 1 client – Node Config: 4 cores, 8GB RAM, 100GB HDD storage • Datasets: – Wikipedia dump (20GB out of ~12TB) – Additional datasets evaluated in the paper 15 ¡
Compression sDedup trad-dedup 50 Compression Ratio 38.9 38.4 40 26.3 30 20 15.2 9.9 9.1 10 4.6 2.3 0 4KB 1KB 256B 64B Chunk Size 20GB sampled Wikipedia dataset 16 ¡
Memory sDedup trad-dedup 780.5 800 Memory (MB) 600 400 272.5 200 133.0 80.2 57.3 61.0 47.9 34.1 0 4KB 1KB 256B 64B Chunk Size 20GB sampled Wikipedia dataset 17 ¡
Other Results (See Paper) • Negligible client perf Negligible client performanc ormance o e overhead erhead • Failur ailure r e rec ecovery is quick and eas ery is quick and easy y • Shar Sharding ding does not hurt c does not hurt compr ompres ession r sion rate e • Mor More da e datasets tasets – Microsoft Exchange, Stack Exchange 18 ¡
Conclusion & Future Work • sDedup : S imilarity-based dedup lication for replicated document databases – Much greater data reduction than traditional dedup – Up to 38x compression ratio for Wikipedia – Resource-e ffj cient design with negligible overhead • Futur Future w e work ork – More diverse datasets – Dedup for local database storage – Di fg erent similarity search schemes (e.g., super-fingerprints) 19 ¡
Recommend
More recommend