knockoff cheap versions in the cloud
play

Knockoff: Cheap versions in the cloud Xianzheng Dou , Peter M. Chen, - PowerPoint PPT Presentation

Knockoff: Cheap versions in the cloud Xianzheng Dou , Peter M. Chen, Jason Flinn Cloud-based storage Google Drive Dropbox Microsoft OneDrive Pros: Ease-of-management Reliability Xianzheng Dou 1 Cloud-based storage Google Drive Dropbox


  1. Knockoff: Cheap versions in the cloud Xianzheng Dou , Peter M. Chen, Jason Flinn

  2. Cloud-based storage Google Drive Dropbox Microsoft OneDrive Pros: Ease-of-management Reliability Xianzheng Dou 1

  3. Cloud-based storage Google Drive Dropbox Microsoft OneDrive Challenges: Storage costs Communication costs Xianzheng Dou 2

  4. Versioning increases costs Google Drive Dropbox Microsoft OneDrive Pros: Recovery of lost data Auditing Troubleshooting Versioning Xianzheng Dou 3

  5. Reducing costs: a new direction • Established methods exploit similarities in data – Chunk-based deduplication – Delta compression – Greater work for incremental gains • Our goal: explore an orthogonal new dimension – Deterministically recompute data in lieu of communication, storage Xianzheng Dou 4

  6. File: data or computation? Computation File data Xianzheng Dou 5

  7. File: data or computation? Computation File data Xianzheng Dou 5

  8. File: data or computation? Computation File data Xianzheng Dou 6

  9. File: data or computation? Computation File data Xianzheng Dou 6

  10. File: data or computation? Computation File data Xianzheng Dou 6

  11. File: data or computation? Computation File data Xianzheng Dou 6

  12. File: data or computation? Computation Different output data File data How can we address non-determinism? Xianzheng Dou 6

  13. File: data or computation? • Deterministic record and replay Record RECORD Logs of nondeterminism Xianzheng Dou … 7 … … …

  14. File: data or computation? • Deterministic record and replay Record RECORD Logs of nondeterminism Xianzheng Dou … 7 … … …

  15. File: data or computation? • Deterministic record and replay Record RECORD Logs of nondeterminism Xianzheng Dou … 7 … … …

  16. File: data or computation? • Deterministic record and replay Record RECORD Replay PLAY Logs of nondeterminism Xianzheng Dou … 7 … … … … … … …

  17. Knockoff • Selectively substitutes computation for data • Benefits – Reduction compared to chunk-based deduplication • Communication costs: 21% • Storage costs: 19% – Benefits increases as we retain versions more frequently – A new fined-grained versioning policy Xianzheng Dou 8

  18. Outline • Introduction • Writing files • Storing files • Evaluation Xianzheng Dou 9

  19. Knockoff • Knockoff selectively represents a file as:  Normal file data (by value)  Logs of the nondeterminism needed to recompute the file (by operation) File Xianzheng Dou 10

  20. Knockoff • Knockoff selectively represents a file as:  Normal file data (by value)  Logs of the nondeterminism needed to recompute the file (by operation) File Xianzheng Dou 10

  21. Knockoff • Knockoff selectively represents a file as:  Normal file data (by value)  Logs of the nondeterminism needed to recompute the file (by operation) OR File Xianzheng Dou 10

  22. Knockoff • Knockoff selectively represents a file as:  Normal file data (by value)  Logs of the nondeterminism needed to recompute the file (by operation) OR File OR Xianzheng Dou 10

  23. An example log for compilation Log entry Values 1 open rc=3 2 mmap rc=<addr>,file=< id,version> 3 gettimeofday rc=0,time=<time> 4 pthread_lock rc=0 5 SIGCHILD … … Xianzheng Dou 11

  24. An example log for compilation Log entry Values 1 open rc=3 Return values from syscalls 2 mmap rc=<addr>,file=< id,version> 3 gettimeofday rc=0,time=<time> Ordering of thread synchronization 4 pthread_lock rc=0 5 SIGCHILD Signals … … Xianzheng Dou 11

  25. An example log for compilation Log entry Values 1 open rc=3 2 mmap rc=<addr>,file=< id,version> 3 gettimeofday rc=0,time=<time> 4 pthread_lock rc=0 5 SIGCHILD … … Xianzheng Dou 11

  26. An example log for compilation Log entry Values 1 open rc=3 2 mmap rc=<addr>,file=< id,version> 3 gettimeofday rc=0,time=<time> 4 pthread_lock rc=0 5 SIGCHILD … … Xianzheng Dou 11

  27. Writing files By operation By value Xianzheng Dou 13

  28. Writing files By operation By value Xianzheng Dou 13

  29. Writing files By operation By value Xianzheng Dou 13

  30. Writing files By operation photo editing By value Xianzheng Dou 14

  31. Writing files By value cryptographic key generation By operation Xianzheng Dou 15

  32. Outline • Introduction • Writing files • Storing files • Evaluation Xianzheng Dou 17

  33. Storing files • Store files by value or by operation? ? • A tradeoff between latency and costs – Current versions: by value – Past versions: by value or by operation Xianzheng Dou 18

  34. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version Materialization delay = 60s Regeneration time = 20s < Xianzheng Dou 19

  35. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version Materialization delay = 60s Regeneration time = 20s < Xianzheng Dou 19

  36. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version Regeneration time = 100s > Materialization delay = 60s Xianzheng Dou 20

  37. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version Regeneration time = 100s > Materialization delay = 60s Xianzheng Dou 20

  38. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version • Longest path > materialization delay Total regeneration time = 20s < 20s Materialization delay = 60s Xianzheng Dou 21

  39. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version • Longest path > materialization delay 30s Total regeneration time = 50s < 20s Materialization delay = 60s Xianzheng Dou 22

  40. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version • Longest path > materialization delay 30s 30s Total regeneration time = 80s > 20s Materialization delay = 60 s Xianzheng Dou 23

  41. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version • Longest path > materialization delay 30s 30s Total regeneration time = 80s > 20s Materialization delay = 60 s Xianzheng Dou 24

  42. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version • Longest path > materialization delay 30s 30s Total regeneration time = 80s > 20s Materialization delay = 60 s Xianzheng Dou 24

  43. Storing past versions • Maximum materialization delay – Time bound for reconstructing any version • Longest path > materialization delay – A greedy algorithm Materialization delay = 60s Xianzheng Dou 25

  44. Storing past versions: versioning policies • Frequency of versioning Xianzheng Dou 26

  45. Storing past versions: versioning policies • Frequency of versioning No versioning Version on close Version on write Eidetic versioning Xianzheng Dou 26

  46. Storing past versions: versioning policies • Frequency of versioning No versioning Version on close Version on write Eidetic versioning Memory-mapped files Any past transient state in memory? Xianzheng Dou 26

  47. Optimization: log compression • Chunk-based deduplication is effective for file data – Executions of the same application have similar patterns – Can it also be applied to computation (logs of nondeterminism)? • Delta compression Xianzheng Dou 28

  48. Optimization: log compression • Problem: a smattering of values differ in each log Xianzheng Dou 29

  49. Optimization: log compression • Problem: a smattering of values differ in each log Delta compression: 42% reduction Xianzheng Dou 29

  50. Outline • Introduction • Writing files • Storing files • Evaluation Xianzheng Dou 30

  51. Evaluation • How much does Knockoff reduce bandwidth usage? • How much does Knockoff reduce storage costs? • What is Knockoff’s performance overhead? • For more experimental results, please refer to our paper Xianzheng Dou 31

  52. Experimental setup • User study – 8 participants performed several simple tasks in one hour • 20-day study – A single-user longitudinal study • A variety of programs used – Various Linux utilities, text editors and programming languages Xianzheng Dou 32

  53. Bandwidth usage: user study Xianzheng Dou 33

  54. Bandwidth usage: user study Data sent to the server (MB) 500 400 Already achieve 80%-85% reduction 300 200 100 0 No versioning Version on close Version on write Eidetic Chunk-based deduplication Knockoff Xianzheng Dou 33

  55. Bandwidth usage: user study Data sent to the server (MB) 500 400 Already achieve 80%-85% reduction 300 200 100 0 No versioning Version on close Version on write Eidetic Chunk-based deduplication Knockoff Xianzheng Dou 33

  56. Bandwidth usage: user study Data sent to the server (MB) 500 400 300 24% 200 100 0 No versioning Version on close Version on write Eidetic Chunk-based deduplication Knockoff Xianzheng Dou 33

Recommend


More recommend