NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor
Acknowledgements Joint work with Jiaying Zhang Partially supported by NSF Middleware Initiative grant SCI-0438298 Network Appliance, Inc. November 27, 2006 Workshop on Middleware for Grid Computing 1
Outline Motivation Design Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 2
Motivation Emerging global scientific collaborations Access to widely distributed data must be reliable, efficient, and convenient Current solution: GridFTP Shared data sets are synchronized manually Our solution: NFSv4.r Replicated file system Excellent performance Conventional file system semantics November 27, 2006 Workshop on Middleware for Grid Computing 3
Usage Scenario Massive cluster of high performance computers Visualization center WAN File server Scientist Personal cluster of high performance computers November 27, 2006 Workshop on Middleware for Grid Computing 4
Usage Scenario Massive cluster of high performance computers Visualization center WAN File replication File replication server server /nfs/user/bob/exp1 /nfs/user/bob/exp1 File replication server Scientist /nfs/user/bob/exp1 Personal cluster of high performance computers November 27, 2006 Workshop on Middleware for Grid Computing 5
Outline Motivation Design Global name space Consistent mutable replication Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 6
Global Name Space /nfs is the global root of all NFS file systems Entries under /nfs are mounted on demand The format of reference names under /nfs follows DNS conventions E.g.: /nfs/umich.edu/lib/file1 November 27, 2006 Workshop on Middleware for Grid Computing 7
Extended Use of DNS DNS SRV resource records carry NFS server location information The corresponding name server maps a logical name to some NFS servers Client-side utility enables transparent access to the global name space November 27, 2006 Workshop on Middleware for Grid Computing 8
Outline Motivation Current work Global Name Space Consistent Mutable Replication Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 9
Why Replication? Performance Access distributed data from nearby or lightly loaded servers Failure resilience Users and applications can switch from a failed replication server to a working one November 27, 2006 Workshop on Middleware for Grid Computing 10
Replication in Practice Read-only replication E.g., AFS Does not support complex data sharing, e.g. concurrent writes Lacks network transparency for writes Optimistic replication E.g., Coda Focus is availability, not consistency November 27, 2006 Workshop on Middleware for Grid Computing 11
Consistent Mutable Replication Problem: state of the practice in file system replication does not satisfy the requirements of global scientific collaborations Solution: consistent mutable replication Problem: can provide Grid applications efficient and reliable data access? November 27, 2006 Workshop on Middleware for Grid Computing 12
Requirements A server-to-server replication protocol Optimal read-only behavior Performance identical to unreplicated system Consistent write behavior Dynamically elect a primary server to coordinate concurrent writes Close-to-open semantics Application opening a file sees the data written by the last application that wrote & closed the file November 27, 2006 Workshop on Middleware for Grid Computing 13
Replication Control: open When a client opens a file for writing, other replication servers are instructed to forward writes. The selected server temporarily becomes the primary for that file wopen client November 27, 2006 Workshop on Middleware for Grid Computing 14
Replication Control: write The primary server asynchronously distributes updates to other servers during file modification write client November 27, 2006 Workshop on Middleware for Grid Computing 15
Replication Control: close After the active replication servers are synchronized, the primary server distributes the active view and withdraws as the primary server for the file close client November 27, 2006 Workshop on Middleware for Grid Computing 16
Consistency View based control (E1 Abbadi, Skeen, and Cristian) guarantees sequential consistency A server becomes primary server after collecting acknowledgements from a majority of replication servers A primary server must ensure that every active replication server has acknowledged its role when a written file is closed Guarantees close-to-open semantics November 27, 2006 Workshop on Middleware for Grid Computing 17
Replication Server Failure Every server keeps track of the per-file liveness of other servers (active view) Primary server removes from the active view any server that fails to respond Primary server sends other servers the active view before releasing its role Active servers refuse any request that comes from a server not in the active view A failed replication server can rejoin the active group only after it synchronizes November 27, 2006 Workshop on Middleware for Grid Computing 18
Primary Server Failure File becomes inaccessible Modification to El Abbadi et al. to allow asynchronous update Ensures durability of data written by a client and acknowledged by the server Clients can continue to access objects that are outside the control of the failed server Applications decide whether to wait for the failed server to recover or to reproduce the computation results November 27, 2006 Workshop on Middleware for Grid Computing 19
Hierarchical Replication Control Primary server election is costly over WAN Heuristic: hierarchical replication control A primary server can assert control at different granularities Reduces costly elections when there is locality of reference November 27, 2006 Workshop on Middleware for Grid Computing 20
Shallow Control A server with shallow control on a file or directory is the primary server for that single object /usr bin local November 27, 2006 Workshop on Middleware for Grid Computing 21
Deep Control A server with deep control on a directory is the primary server for everything in the subtree rooted at that directory /usr bin local November 27, 2006 Workshop on Middleware for Grid Computing 22
Outline Motivation Current work Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 23
NAS Grid Benchmarks An evaluation tool released by NASA for Grid computing An instance of NGB class (mesh size, number of iterations) source(s) of input data consumer(s) of solution values November 27, 2006 Workshop on Middleware for Grid Computing 24
Four NGB Problems Embarrassingly Distributed (ED) Launch SP SP SP SP SP SP SP SP SP Report Helical Chain (HC) Visualization Pipe (VP) Mixed Bag (MB) Launch Launch Launch BT SP LU BT MG FT LU LU LU LU SP BT BT MG FT MG MG MG BT SP LU BT MG FT FT FT FT Report Report Report November 27, 2006 Workshop on Middleware for Grid Computing 25
Experiment Setup November 27, 2006 Workshop on Middleware for Grid Computing 26
Helical Chain (Small) November 27, 2006 Workshop on Middleware for Grid Computing 27
Helical Chain Medium November 27, 2006 Workshop on Middleware for Grid Computing 28
Helical Chain Large November 27, 2006 Workshop on Middleware for Grid Computing 29
Helical Chain Huge November 27, 2006 Workshop on Middleware for Grid Computing 30
Visualization Pipe Small November 27, 2006 Workshop on Middleware for Grid Computing 31
Visualization Pipe Medium November 27, 2006 Workshop on Middleware for Grid Computing 32
Visualization Pipe Large November 27, 2006 Workshop on Middleware for Grid Computing 33
Visualization Pipe Huge November 27, 2006 Workshop on Middleware for Grid Computing 34
Conclusion Conventional wisdom Consistent mutable replication in large-scale distributed storage systems is too expensive to consider Our experiments prove otherwise Consistent mutable replication in large-scale distributed storage systems is feasible and practical Superior performance Rigorous adherence to ordinary semantics November 27, 2006 Workshop on Middleware for Grid Computing 35
Thank you for your attention! Questions?! www.citi.umich.edu November 27, 2006 Workshop on Middleware for Grid Computing 36
Recommend
More recommend