C HALLENGES Transparency: ➜ Location: a client cannot tell where a file is located D ISTRIBUTED S YSTEMS [COMP9243] ➜ Migration: a file can transparently move to another server ➜ Replication: multiple copies of a file may exist ➜ Concurrency: multiple clients access the same file Lecture 8b: Distributed File Systems Flexibility: Slide 1 Slide 3 ➀ Introduction ➜ Servers may be added or replaced ➁ NFS (Network File System) ➜ Support for multiple file system types ➂ AFS (Andrew File System) & Coda Dependability: ➃ GFS (Google File System) ➜ Consistency: conflicts with replication & concurrency ➜ Security: users may have different access rights on clients sharing files & network transmission ➜ Fault tolerance: server crash, availability of files I NTRODUCTION Distributed File System Paradigm: Performance: ➜ File system that is shared by many distributed clients ➜ Requests may be distributed across servers ➜ Communication through shared files ➜ Multiple servers allow higher storage capacity ➜ Shared data remains available for long time Scalability: ➜ Basic layer for many distributed systems and applications ➜ Handle increasing number of files and users Slide 2 Slide 4 Clients and Servers: ➜ Growth over geographic and administrative areas ➜ Clients access files and directories ➜ Growth of storage space ➜ Servers provide files and directories ➜ No central naming service ➜ Servers allow clients to perform operations on the files and ➜ No centralised locking directories ➜ No central file store ➜ Operations: add/remove, read/write ➜ Servers may provide different views to different clients C HALLENGES 1 T HE C LIENT ’ S P ERSPECTIVE : F ILE S ERVICES 2
Immutable files: ➜ Files allow only CREATE and READ T HE C LIENT ’ S P ERSPECTIVE : F ILE S ERVICES ➜ Directories can be updated ➜ Instead of overwriting the contents of a file, a new one is Ideally, the client would perceive remote files like local ones. created and replaces the old one File Service Interface: � Race condition when two clients replace the same file ➜ File: uninterpreted sequence of bytes Slide 5 Slide 7 � How to handle readers of a file when it is replaced? ➜ Attributes: owner, size, creation date, permissions, etc. Atomic transactions: ➜ Protection: access control lists or capabilities ➜ A sequence of file manipulations is executed indivisibly ➜ Immutable files: simplifies caching and replication ➜ Two transaction can never interfere ➜ Upload/download model versus remote access model ➜ Standard for databases ➜ Expensive to implement F ILE A CCESS S EMANTICS T HE S ERVER ’ S P ERSPECTIVE : I MPLEMENTATION U NIX semantics: Design Depends On the Use: ➜ A READ after a WRITE returns the value just written ➜ Satyanarayanan, 1980’s university U NIX use ➜ When two WRITE s follow in quick succession, the second persists ➜ Most files are small—less than 10k ➜ Caches are needed for performance & write-through is ➜ Reading is much more common than writing expensive ➜ Usually access is sequential; random access is rare ➜ U NIX semantics is too strong for a distributed file system ➜ Most files have a short lifetime Slide 6 Session semantics: Slide 8 ➜ File sharing is unusual, Most process use only a few files ➜ Changes to an open file are only locally visible ➜ Distinct files classes with different properties exist ➜ When a file is closed, changes are propagated to the server (and other clients) Is this still valid? ➜ But it also has problems: There are also varying reasons for using a DFS: • What happens if two clients modify the same file ➜ Big file system, many users, inherent distribution simultaneously? ➜ High performance • Parent and child processes cannot share file pointers if ➜ Fault tolerance running on different machines. F ILE A CCESS S EMANTICS 3 S TATELESS V ERSUS S TATEFUL S ERVERS 4
S TATELESS V ERSUS S TATEFUL S ERVERS R EPLICATION Advantages of stateless servers: Multiple copies of files on different servers: ➜ Fault tolerance ➜ Prevent data loss ➜ No OPEN / CLOSE calls needed ➜ Protect system against down time of a single server ➜ No server space needed for tables ➜ Distribute workload ➜ No limits on number of open files Slide 9 ➜ No problems if server crashes Slide 11 Three designs: ➜ No problems if client crashes ➜ Explicit replication: The client explicitly writes files to multiple servers (not transparent). Advantages of stateful servers: ➜ Lazy file replication: Server automatically copies files to other ➜ Shorter request messages servers after file is written. ➜ Better performance ➜ Group file replication: WRITE s simultaneously go to a group of ➜ Read ahead easier servers. ➜ File locking possible C ACHING We can cache in three locations: ➀ Main memory of the server: easy & transparent ➁ Disk of the client ➂ Main memory of the client (process local, kernel, or dedicated cache process) C ASE S TUDIES Cache consistency: ➜ Network File System (NFS) Slide 10 Slide 12 ➜ Obvious parallels to shared-memory systems, but other trade ➜ Andrew File System (AFS) & Coda offs ➜ Google File System (GFS) ➜ No U NIX semantics without centralised control ➜ Plain write-through is too expensive; alternatives: delay WRITE s and agglomerate multiple WRITE s ➜ Write-on-close; possibly with delay (file may be deleted) ➜ Invalid cache entries may be accessed if server is not contacted whenever a file is opened R EPLICATION 5 N ETWORK F ILE S YSTEM (NFS) 6
N ETWORK F ILE S YSTEM (NFS) Properties: Server side: ➜ Introduced by Sun ➜ Fits nicely into U NIX ’s idea of mount points, but does not ➜ NFS protocol independent of underlying FS implement U NIX semantics ➜ NFS server runs as a daemon Slide 13 Slide 15 ➜ Multiple clients & servers (a single machine can be a client and ➜ /etc/export : specifies what directories are exported to whom a server) under which policy ➜ Stateless servers (no OPEN & CLOSE ) (changed in v4) ➜ Transparent caching ➜ File locking through separate server ➜ No replication ➜ ONC RPC for communication Client Server System call layer System call layer Virtual file system Virtual file system (VFS) layer (VFS) layer Local file Local file Slide 14 Slide 16 NFS client NFS server system interface system interface RPC client RPC server stub stub Network N ETWORK F ILE S YSTEM (NFS) 7 N ETWORK F ILE S YSTEM (NFS) 8
System Architecture: Client side: ➜ Client: User-level process Venus (AFS daemon) ➜ Explicit mounting versus automounting ➜ Cache on local disk ➜ Hard mounts versus soft mounts ➜ Trusted servers collectively called Vice ➜ Supports diskless workstations ➜ Caching of file attributes and file data Virtue client machine Caching: User User Venus process process process Slide 17 Slide 19 ➜ Implementation specific ➜ Caches result of read , write , getattr , lookup , readdir RPC client stub ➜ Consistency through polling and timestamps Local file Virtual file system layer ➜ Cache entries are discarded after a fixed period of time system interface ➜ Modified files are sent to server asynchronously (when closed, or Local OS client performs sync) Network ➜ Read-ahead and delayed write possible Transparent access to a Vice file server A NDREW F ILE S YSTEM (AFS) & C ODA Properties: Virtue client ➜ From Carnegie Mellon University (CMU) in the 1980s. ➜ Developed as campus-wide file system: Scalability Slide 18 Slide 20 ➜ Global name space for file system (divided in cells , e.g. /afs/cs.cmu.edu , /afs/ethz.ch ) ➜ API same as for U NIX ➜ U NIX semantics for processes on one machine, but globally write-on-close Vice file server A NDREW F ILE S YSTEM (AFS) & C ODA 9 A NDREW F ILE S YSTEM (AFS) & C ODA 10
D ESIGN & A RCHITECTURE Disconnected operation: ➜ All client updates are logged in a Client Modification Log (CML) Scalability: ➜ On re-connection, the operations registered in the CML are ➜ Server serves whole files replayed on the server ➜ Clients cache whole files ➜ CML is optimised (e.g. file creation and removal cancels out) ➜ Server invalidates cached files with callback (stateful servers) ➜ On weak connection, CML is reintegrated on server by trickle ➜ Clients do not validate cache (except on first use after booting) reintegration Slide 21 Slide 23 ➜ Modified files are written back to server on close() ➜ Trickle reintegration tradeoff: Immediate reintegration of log ➜ Result: Very little cache validation traffic entries reduces chance for optimisation, late reintegration ➜ Flexible volume per user (resize, move to other server) increases risk of conflicts ➜ Read-only volumes for software ➜ File hoarding: System (or user) can build a user hoard database, ➜ Read-only replicas which it uses to update frequently used files in a hoard walk ➜ Conflicts: Automatically resolved where possible; otherwise, manual correction necessary ➜ Conflict resolution for temporarily disconnected servers Servers: C ODA ➜ Read/write replication servers are supported ➜ Successor of the Andrew File System (AFS) ➜ Replication is organised on a per volume basis Slide 22 Slide 24 • System architecture quite similar to AFS ➜ Group file replication (multicast RPCs); read from any server ➜ Supports disconnected, mobile operation of clients ➜ Version stamps are used to recognise server with out of date ➜ Supports replication files (due to disconnect or failure) D ESIGN & A RCHITECTURE 11 G OOGLE F ILE S YSTEM 12
Recommend
More recommend