pr pract ctica cal web ba based d delta sy sync fo for
play

Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud - PowerPoint PPT Presentation

Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud Storage Services He Xiao Zhenhua Li Ennan Zhai Tianyin Xu xiaoh16@gmail.com July 10, 2017 Hotstorage17 Ne Network Traffic is Ov Overwhelming in in Clo loud Storag age


  1. Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud Storage Services He Xiao Zhenhua Li Ennan Zhai Tianyin Xu xiaoh16@gmail.com July 10, 2017 Hotstorage’17

  2. Ne Network Traffic is Ov Overwhelming in in Clo loud Storag age Cloud Traffic has 30% CAGR ( C ompound A verage G rowth R ate) File Sync Vendors Users Network Traffic Client Sever 2

  3. De Delta Sync Improve ves Ne Network Ef Efficiency Delta sync support in nine state-of-the-art cloud Full File 10 MB storage services Old File New File Full Sync 10 MB Delta Data 1 B Old File New File Delta Sync Delta Sync is crucial for reducing cloud storage network traffic. 3

  4. No No We Web-ba based d Delta Sync nc Web-based delta sync is essential for cloud storage web clients and web apps Web Apps with local storage or log Web is the most pervasive and OS- files need web-based Delta Sync independent cloud storage access method Why web-based delta sync is not supported by today’s cloud storage services ? 4

  5. Co Contribution • We quantitatively study why web-based delta sync is not offered by today’s cloud storage services. • We build a practical web-based delta sync solution for cloud storage services. • By reversing traditional delta sync process, we make the overhead affordable at the web client side. • By exploiting the locality of users’ edits and trading off hash algorithms, we make the computation overhead affordable at the server side. 5

  6. WebRsync: We : Imp mpleme ment Delta Sync on Web • Implement rsync on real cloud storage with native web tech: JavaScript + HTML5 + WebSocket rsync is the de facto solution of delta sync in cloud storage • C Implementation JavaScript Implementation WebSocket of Rsync of Rsync Web Server Web Browser High-Speed HTML5 Internal Network FileAPI Local Storage Backend File System Aliyun OSS / OpenStack Swift 6

  7. We WebRsync vs. rsync Average Client CPU utilization Sync time of WebRsync vs rsync 7

  8. St Stagnation due to JavaScr Script’s Si Single- th thread Ev Event Loop Model StagMeter //print timestamp every 100ms setInterval(print(timestamp),100) //print the timestamp of every keystone( start or end of a task) on_start(task); print(task.id, timestamp) on_finish(task); print(task.id, timestamp) 8

  9. St StagMeter on WebRsync 1. Send meta data 2. Checksum Search 3. Send tokens and literal bytes Wait server and Comparison Wait server High CPU Utilization when computing Timestamp Printing is suspended Web is under stagation state Sync Process (Second) 9

  10. We WebR2sync: Client-si side Optimization Re Reverse Co Computation Process Client Server Request for Syncing File f’ Segmentation Checksum List of f Fingerprinting Searching Comparing Generate tokens and Literal Bytes Construct ACK New File f WebRsync 10

  11. We WebR2sync: Client-si side optimization Re Reverse Com omputation on Pr Proce cess Client Server Request for Segmentation Syncing File f’ Fingerprinting Checksum List of f Searching Comparing Generate Tokens And Literal Bytes Construct ACK New File f • Web Reverse Rsync: Reverse complicatedcomputationfrom server to client. 11

  12. Pe Performance of We WebR2sync Sync Time (Second) Sync Time (Second) Edit Size (Byte) Edit Size (Byte) Issue: Server takes severely heavy overhead. 12

  13. Se Server-si side Ov Overhead Profiling Checksum searching and block comparison occupy 80% of the computing time MD5 Computing Checksum Search Ø Use faster hash functions to replace MD5 Ø Reduce checksum searching overhead 13

  14. Re Replacing MD5 with S SipHash in Ch Chunk Com omparison on A comparison of pseudorandom hash functions Hash Function Collision Cycles per Byte Probability SipHash remain low MD5 Low 5.58 Collision Probability Murmur3 High 0.33 at much faster speed Spooky High 0.14 SipHash Low 1.13 14

  15. So Solve Possible Hash Collision • Replace MD5 with SipHash, may cause potential collisions (Probability p), so does MD5. • Our Solution: Use Spooky (fastest method, collision probability p’). • The probability of collisions is p*p’ • Alternative : Use MD5 or other strong hash functions as a global verification. • Compute MD5 over whole file is expensive. 15

  16. Re Reduce Ch Chunk Sea earching by Exploiting Loc ocality of of Fi File Edi dits. 95% synchronized files have less than 10 edits. Checksum Hash Table search Adler32-1 Adler32-2 Adler32-3 Adler32-4 Compare MD5-3 MD5-1 MD5-2 MD5-4 Block3 Block1 Block2 Block4 16

  17. Ev Evaluation Setup Basic experiment setup visualized in a map of China 17

  18. Sy Sync c Time 10 1 WebRsync Sync Time (Second) WebR2sync WebR2sync+ rsync 10 0 10 -1 1 10 100 1K 10K 100k Edit Size (Byte) WebR2sync+ is 2-3 times faster than WebR2sync and 15-20 times faster than WebRsync 18

  19. Th Throughput rsync WebR2sync+ WebR2sync WebRsync NoWebRsync 0 2000 4000 6000 8000 Number of Concurrent Users This throughput is as 4 times as that of WebR2sync/ rsync and as 9 times as that of NoWebRsync. 19

  20. Fu Future Work • Evaluate our approach under different edit modes • delete, insert, append • Evaluate traffic efficiency • all the methods should have similar traffic efficiency • Understand the effects of three optimizations • evaluate them separately 20

  21. Di Discussion • Probability of collisions of file checksums • Characteristics of file operations in real-world scenarios from the perspective of sync • Locality measure for deciding whether to apply locality-based optimization. 21

  22. Co Conclusion • WebR2sync+ is a practical solution for web- based delta sync • lightweight computation at the client side • optimized overhead at the server side • the server-side optimizations can be adopted in the traditional cloud storage architecture 22

  23. Thanks! discussion 23

  24. We WebRsync Detailed De Descripti tion Weak Block1 Adler32 MD5 NO Checksum Block2 Adler32 MD5 Search Block3 Adler32 MD5 YES … … … NO Strong Checksum Compare 1 byte offset YES Rolling Adler32 O(1): Adler(i)=>Adler(i+1) 1 block offset Matched Tokens Literal Bytes Construct New File 24 Client Server

  25. We WebR2sync: Flowchart and Data st structure Block 1 Block 2 Weak NO Checksum Block 3 Search Block 4 Block 1 YES Block 2 Block 3 NO Strong Block 4 Checksum Compare 1 byte offset No further Operation YES When find a match, record the associated index Construct New Files Client Server 25

  26. Sy Sync c Time deco composed 0.2 Sync Time (Second) Server Network 0.15 Client 0.1 0.05 0 1 10 100 1K 10K 100K Edit Size (Byte) WebR2sync+ client takes stable and shorter time. Because of the Server-side optimization, computing time is much shorter both in client and server. 26

Recommend


More recommend