nfsv4 replication for grid storage middleware
play

NFSv4 Replication for Grid Storage Middleware Peter Honeyman - PowerPoint PPT Presentation

NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor Acknowledgements Joint work with Jiaying Zhang Partially supported by NSF Middleware


  1. NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor

  2. Acknowledgements  Joint work with Jiaying Zhang  Partially supported by  NSF Middleware Initiative grant SCI-0438298  Network Appliance, Inc. November 27, 2006 Workshop on Middleware for Grid Computing 1

  3. Outline  Motivation  Design  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 2

  4. Motivation  Emerging global scientific collaborations  Access to widely distributed data must be reliable, efficient, and convenient  Current solution: GridFTP  Shared data sets are synchronized manually  Our solution: NFSv4.r  Replicated file system  Excellent performance  Conventional file system semantics November 27, 2006 Workshop on Middleware for Grid Computing 3

  5. Usage Scenario Massive cluster of high performance computers Visualization center WAN File server Scientist Personal cluster of high performance computers November 27, 2006 Workshop on Middleware for Grid Computing 4

  6. Usage Scenario Massive cluster of high performance computers Visualization center WAN File replication File replication server server /nfs/user/bob/exp1 /nfs/user/bob/exp1 File replication server Scientist /nfs/user/bob/exp1 Personal cluster of high performance computers November 27, 2006 Workshop on Middleware for Grid Computing 5

  7. Outline  Motivation  Design  Global name space  Consistent mutable replication  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 6

  8. Global Name Space  /nfs is the global root of all NFS file systems  Entries under /nfs are mounted on demand  The format of reference names under /nfs follows DNS conventions  E.g.: /nfs/umich.edu/lib/file1 November 27, 2006 Workshop on Middleware for Grid Computing 7

  9. Extended Use of DNS  DNS SRV resource records carry NFS server location information  The corresponding name server maps a logical name to some NFS servers  Client-side utility enables transparent access to the global name space November 27, 2006 Workshop on Middleware for Grid Computing 8

  10. Outline  Motivation  Current work  Global Name Space  Consistent Mutable Replication  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 9

  11. Why Replication?  Performance  Access distributed data from nearby or lightly loaded servers  Failure resilience  Users and applications can switch from a failed replication server to a working one November 27, 2006 Workshop on Middleware for Grid Computing 10

  12. Replication in Practice  Read-only replication  E.g., AFS  Does not support complex data sharing, e.g. concurrent writes  Lacks network transparency for writes  Optimistic replication  E.g., Coda  Focus is availability, not consistency November 27, 2006 Workshop on Middleware for Grid Computing 11

  13. Consistent Mutable Replication  Problem: state of the practice in file system replication does not satisfy the requirements of global scientific collaborations  Solution: consistent mutable replication  Problem: can provide Grid applications efficient and reliable data access? November 27, 2006 Workshop on Middleware for Grid Computing 12

  14. Requirements  A server-to-server replication protocol  Optimal read-only behavior  Performance identical to unreplicated system  Consistent write behavior  Dynamically elect a primary server to coordinate concurrent writes  Close-to-open semantics  Application opening a file sees the data written by the last application that wrote & closed the file November 27, 2006 Workshop on Middleware for Grid Computing 13

  15. Replication Control: open When a client opens a file for writing, other replication servers are instructed to forward writes. The selected server temporarily becomes the primary for that file wopen client November 27, 2006 Workshop on Middleware for Grid Computing 14

  16. Replication Control: write The primary server asynchronously distributes updates to other servers during file modification write client November 27, 2006 Workshop on Middleware for Grid Computing 15

  17. Replication Control: close After the active replication servers are synchronized, the primary server distributes the active view and withdraws as  the primary server for the file close client November 27, 2006 Workshop on Middleware for Grid Computing 16

  18. Consistency  View based control (E1 Abbadi, Skeen, and Cristian) guarantees sequential consistency  A server becomes primary server after collecting acknowledgements from a majority of replication servers  A primary server must ensure that every active replication server has acknowledged its role when a written file is closed  Guarantees close-to-open semantics November 27, 2006 Workshop on Middleware for Grid Computing 17

  19. Replication Server Failure  Every server keeps track of the per-file liveness of other servers (active view)  Primary server removes from the active view any server that fails to respond  Primary server sends other servers the active view before releasing its role  Active servers refuse any request that comes from a server not in the active view  A failed replication server can rejoin the active group only after it synchronizes November 27, 2006 Workshop on Middleware for Grid Computing 18

  20. Primary Server Failure  File becomes inaccessible  Modification to El Abbadi et al. to allow asynchronous update  Ensures durability of data written by a client and acknowledged by the server  Clients can continue to access objects that are outside the control of the failed server  Applications decide whether to wait for the failed server to recover or to reproduce the computation results November 27, 2006 Workshop on Middleware for Grid Computing 19

  21. Hierarchical Replication Control  Primary server election is costly over WAN  Heuristic: hierarchical replication control  A primary server can assert control at different granularities  Reduces costly elections when there is locality of reference November 27, 2006 Workshop on Middleware for Grid Computing 20

  22. Shallow Control  A server with shallow control on a file or directory is the primary server for that single object /usr bin local November 27, 2006 Workshop on Middleware for Grid Computing 21

  23. Deep Control  A server with deep control on a directory is the primary server for everything in the subtree rooted at that directory /usr bin local November 27, 2006 Workshop on Middleware for Grid Computing 22

  24. Outline  Motivation  Current work  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 23

  25. NAS Grid Benchmarks  An evaluation tool released by NASA for Grid computing  An instance of NGB  class (mesh size, number of iterations)  source(s) of input data  consumer(s) of solution values November 27, 2006 Workshop on Middleware for Grid Computing 24

  26. Four NGB Problems Embarrassingly Distributed (ED) Launch SP SP SP SP SP SP SP SP SP Report Helical Chain (HC) Visualization Pipe (VP) Mixed Bag (MB) Launch Launch Launch BT SP LU BT MG FT LU LU LU LU SP BT BT MG FT MG MG MG BT SP LU BT MG FT FT FT FT Report Report Report November 27, 2006 Workshop on Middleware for Grid Computing 25

  27. Experiment Setup November 27, 2006 Workshop on Middleware for Grid Computing 26

  28. Helical Chain (Small) November 27, 2006 Workshop on Middleware for Grid Computing 27

  29. Helical Chain Medium November 27, 2006 Workshop on Middleware for Grid Computing 28

  30. Helical Chain Large November 27, 2006 Workshop on Middleware for Grid Computing 29

  31. Helical Chain Huge November 27, 2006 Workshop on Middleware for Grid Computing 30

  32. Visualization Pipe Small November 27, 2006 Workshop on Middleware for Grid Computing 31

  33. Visualization Pipe Medium November 27, 2006 Workshop on Middleware for Grid Computing 32

  34. Visualization Pipe Large November 27, 2006 Workshop on Middleware for Grid Computing 33

  35. Visualization Pipe Huge November 27, 2006 Workshop on Middleware for Grid Computing 34

  36. Conclusion  Conventional wisdom  Consistent mutable replication in large-scale distributed storage systems is too expensive to consider  Our experiments prove otherwise  Consistent mutable replication in large-scale distributed storage systems is feasible and practical  Superior performance  Rigorous adherence to ordinary semantics November 27, 2006 Workshop on Middleware for Grid Computing 35

  37. Thank you for your attention! Questions?! www.citi.umich.edu November 27, 2006 Workshop on Middleware for Grid Computing 36

Recommend


More recommend