toward eidetic distributed file systems
play

Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, - PowerPoint PPT Presentation

Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, Peter M. Chen Rich file system features Modern file systems store more than just data Versioning: retention of past state Provenance-aware: connections between file


  1. Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, Peter M. Chen

  2. Rich file system features • Modern file systems store more than just data – Versioning: retention of past state – Provenance-aware: connections between file data • Problem: – High costs for providing these rich features Xianzheng Dou 1

  3. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost 2

  4. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Ext4 2

  5. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Versionfs WAFL 2

  6. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Elephant FS 2

  7. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost CVFS Wayback 2

  8. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Any past user-level state? 2

  9. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Any past user-level state? Any past file system state and any transient state 2

  10. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost 3

  11. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost Ext4 3

  12. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost Connections 3

  13. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost PASS 3

  14. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost Complete byte-level provenance? 3

  15. Background: eidetic systems[OSDI’14] • Recall any past user-level state – By pervasive deterministic record and replay Logs of Replay Record PLAY RECORD non-deterministic events … … … … … … … … Xianzheng Dou 4

  16. Background: eidetic systems[OSDI’14] • Recall any past user-level state – By pervasive deterministic record and replay Logs of Replay Record PLAY RECORD non-deterministic events • Provenance at the byte granularity … … – Intra-process lineage: dynamic information tracking … … … … … … – Inter-process lineage: data flow dependency graph Xianzheng Dou 4

  17. A clean-sheet design of FS • Eidetic systems prototype – Graft eidetic support onto an existing FS – Consider only local storage • An eidetic distributed file system – A small number of personal devices + cloud servers • New design choices – Fundamental unit of persistent storage – File transfer Xianzheng Dou 5

  18. Traditional distributed FS Xianzheng Dou 6

  19. Traditional distributed FS Xianzheng Dou 6

  20. Traditional distributed FS Xianzheng Dou 6

  21. Traditional distributed FS Xianzheng Dou 6

  22. Eidetic distributed file systems Xianzheng Dou 7

  23. Eidetic distributed file systems Xianzheng Dou 7

  24. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 8

  25. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 8

  26. Fundamental unit • What is the fundamental unit of persistent storage? Replay Xianzheng Dou 8

  27. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 9

  28. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 9

  29. Fundamental unit • What is the fundamental unit of persistent storage? Fundamental unit: Logs of non-determinism Files are only considered as caches Xianzheng Dou 9

  30. File persistency • When is a file considered persistent on the server? Xianzheng Dou 10

  31. File persistency • When is a file considered persistent on the server? As long as logs generating the data are persistent Xianzheng Dou 10

  32. File persistency • When is a file considered persistent on the server? Xianzheng Dou 10

  33. Updating server cache • Should the server cache the file version? ? Xianzheng Dou 11

  34. Updating server cache • Should the server cache the file version? ? Probability of future access Costs for regeneration Xianzheng Dou 11

  35. File transfer methods • How are files transferred to the server? Xianzheng Dou 12

  36. File transfer methods • How are files transferred to the server? Xianzheng Dou 12

  37. File transfer methods • How are files transferred to the server? Xianzheng Dou 13

  38. File transfer methods • How are files transferred to the server? Xianzheng Dou 13

  39. File transfer methods • How are files transferred to the server? Compare computation costs with communication costs - by value (file data) - or by replay Xianzheng Dou 13

  40. Read path • How should a client read a particular version? Xianzheng Dou 14

  41. Read path • How should a client read a particular version? Xianzheng Dou 14

  42. Available transfer methods • How should a client read a particular version? Xianzheng Dou 15

  43. Available transfer methods • How should a client read a particular version? Xianzheng Dou 15

  44. Available transfer methods • How should a client read a particular version? Xianzheng Dou 15

  45. Available transfer methods • How should a client read a particular version? Xianzheng Dou 16

  46. Available transfer methods • How should a client read a particular version? Xianzheng Dou 16

  47. Available transfer methods • How should a client read a particular version? Xianzheng Dou 16

  48. Available transfer methods • How should a client read a particular version? Xianzheng Dou 17

  49. Available transfer methods • How should a client read a particular version? By value By replay on the client By replay on the server From peers Xianzheng Dou 17

  50. Choosing the right method • How should a client read a particular version? • Server has the most complete knowledge • Metrics – User waiting time – Monetary cost – Client energy consumption Xianzheng Dou 18

  51. Feasibility • Eidetic system overheads – Record 4 years of workstation data on a 4TB hard disk – Under 8% performance overhead on most benchmarks • Applications (log size vs. diff size) – Logs are smaller • image/audio editing, latex, make, slides editing – Diffs are smaller: text editing • File sharing – Most files are not shared Xianzheng Dou 19

  52. Conclusions • A new point in the design space of – Versioning file systems – Provenance-aware file systems • Hypothesis – More effective in versioning and provenance – Achieving reasonable overheads • Under implementation Xianzheng Dou 20

  53. Thank you! Xianzheng Dou 21

Recommend


More recommend