Scale and Performance in a Distributed File System (AFS) Howard et al. CMU 1988, ACM TOCS Presenter: Dhirendra Singh Kholia
Outline • What is AFS? • The Prototype implementation • Changes for Performance • Effect of Changes for Performance • Comparison with NFS • Conclusion • Q&A
AFS (Andrew File System) • AFS is a distributed filesystem that enables efficient sharing of storage resources across both local area and wide area networks. • Development started at CMU around 1983 • Goal: 5,000 - 10,000 nodes (very high scalability!) • Scale yet maintain performance and simple administration
AFS • Client-Server Architecture • Vice : Set of trusted servers • Clients run user level process called Venus • Venus caches files from Vice • Caching based on upload/download (whole- file) transfer model • Venus contacts Vice only for open and close operations
The Prototype Implementation • Spawned a dedicated process for every client • Each server contained a directory hierarchy mirroring the structure of the Vice files .admin directory – Vice file status information Stub directory – location database embedded in the file tree • Pathname resolution done by Vice (servers) • Venus verifies timestamp before using cached file (open() and stat() force contact with Vice!) • Coarse-grained read-only replication • Dedicated lock-server process
Benchmark Details • script operates on a collection of source code files. • 70 files totaling 200KiB • 5 phases:
Local FS performance Benchmark took ~1000 seconds on a Sun2 workstation
Prototype Performance 1 Load Unit = 5 Andrew users 70% slower than local FS Doesn’t scale well after 5-8 Load Units
Call Distribution gets status information for files not in cache validated cache entries • 2 calls accounted for almost 90% of total calls! open() and stat() force contact with Vice. • Caching Works ( > 80% Hit Ratio) • “Cache validation driven totally by Venus” is not a good idea Source: http://dcslab.snu.ac.kr/courses/dip2009f/presentation_old/3.ppt
Prototype resource usage • 75% CPU utilization over 5-minute period, 40% over 8-hour period! • CPU is the performance bottleneck! • Causes: pathname resolution, excessive context switches
Problems with the Prototype • High virtual memory paging demands (fork model) • High CPU usage • Exceeded critical resource limits, network-related resources frequently • High frequency of cache validation checks (too many stats) • Difficult to move directories around (and thus balance load) • Despite all these problems, the prototype was robust, simple and it worked. • “… our users willingly suffered!”
Changes for Performance • Cache Management • Name Resolution + Low-Level Storage Representation • Process structure Target: Handle at least 50 clients per server.
Cache Management • Status cache (in virtual memory for fast stat() perf.) • Data cache (in local disk) • Now caches directory contents and symlinks too! • Venus now assumes that cache entries are valid unless otherwise notified by Vice • Callback – the server promises to notify client before allowing a modification + This reduce cache validation traffic and server load. - Maintenance of callback state information. - There is a potential for inconsistency (how?)
Name Resolution + Low-Level Storage Representation Earlier, pathname resolution was done by Vice • (costly implicit namei operation caused server load) Now, Venus maps Vice pathnames to Fid and passes Fid to Vice • 96-bit FID = 32-bit Volume Number, 32-bit Vnode number + 32-bit • Uniquifier Key Idea: Eliminate pathname lookups (Use Fid on servers and • inodes on clients directly) The Volume Number identifies a Volume and the location of Volume • is contained in Volume Location Database.
Process Structure • Use fixed number of LWPs within one process. • An LWP is bound to a particular client only for the duration of a single server operation. • User space RPC implementation
AFS Consistency Semantics • Visibility of writes to an open file by a process on a workstation is limited to that particular workstation • Commit on close (write-on-close) changes are now visible to new opens, open instances do not see the changes • All other file operations are visible everywhere immediately • No implicit locking, multiple clients can perform same operation on a file concurrently • Application have to cooperate and manage synchronization
Effect of Chances for Performance
Effect of Chances for Performance • Only 19% slower than a stand-alone workstation • ScanDir and ReadAll phases almost independent of load! • Scales well and the target of 50 clients is easily met!
Comparison with NFS (Remote Open) • File Data is not fetched in one go • Advantage of remote-open model: Low Latency
Comparison with NFS (Time)
Comparison with NFS (CPU)
Comparison with NFS (Disk)
Comparison Report • NFS failed to work properly at high loads! • For 1 LU, NFS generated ~3 times as many packets as AFS • NFS’s performance degrades rapidly with load • NFS saturated CPU and Disk and still couldn’t keep up (despite the fact that it operates entirely in kernel!) • NFS doesn’t scale well (actually it doesn’t seem to scale at all)
Changes for Operability • A Volume is a collection of files. Each user is assigned a Volume. • Volume is like a mini-filesystem in itself. It can grow/shrink in size. • Volumes allow quotas, consistent backups and read- only replication and painless live migration of data • Volumes keep the size of VLD manageable. • Volume abstraction is indispensable!
Conclusion • Only problems I see are: A) limit of 64K files per directory B) whole-file caching (making it slow for big files) • Overall, AFS is awesome
Questions - Scaling • Do they ever reach their goal of 5000 workstations? OR are distributed file systems fundamentally flawed and cannot scale indefinitely? • Yes, AFS should be able to manage that magical number. (http://www.openafs.org/success.html)
Questions - Locking • Isn't the lack of any form of synchronization amongst the files dangerous? 4.2BSD doesn’t lock files implicitly and AFS conforms to these semantics . Yes, it seems dangerous but even under modern *NIX, locks are advisory by default, which again requires application to behave “correctly”. • Couldn't a single badly written program corrupt a whole lot of important data? Blame the program then
Questions - Caching • Is the caching of the entire file a good idea, given the huge size of files these days? Latency is a big problem with whole-transfer model. Well even for a 24KB file the latency was ~0.5 seconds (quite noticeable!). For huge files it would get quite worse (linearly though). However, whole-file transfer is the key to AFS scaling! Lets discuss this.
Questions - Caching • Do server remove callbacks for expired cache items in clients? If it does, how would a server know what the workstation has cached and What items have expired? Will workstation notify server about expired cache items? • Yes, Venus executes RemoveCallBack (while flushing an item out of cache) which tells the server the filename to remove callback from.
Questions - Locking • The authors state that user-level file locking was implemented by a dedicated lock server process. How does this centralized locking mechanism affect scalability? Locking is not done implicitly. So only particular applications actually will use the lock mechanism.
Questions • Embedding of file location information in the file storage structure made movement of files across servers difficult, because it required "structural modifications to storage on the servers"... what structural modifications does it mean? • My Guess : Moving a part of namespace will require a new partition on the new server. (Since only entire disk partitions could be mounted and the existing partition could not be used to serve as another mount point).
Questions – Cache Size • Diskless operation is possible but slow and files that are larger than the local disk cache cannot be accessed at all. Why couldn't they be accessed using the same slow method as diskless operation? File has to always fit in the cache (memory or disk)!
Questions - Consistency • This paper does not mention file conflicts (i.e. users modifying stale copies of files). Are file conflicts possible? • What happens when Client A and Client B open and begin modifying the same file? If Client A closes the file first and B closes second then, are the are the changes done by Client A lost? Can the server refuse to close() for Client B because it knows that Callback for B is missing/broken?
Questions • It seems like the performance of AFS will be quite low for small updates to huge files. So how can we overcome this problem? Will the performance of the system will be hampered if small updates to huge files happen very often? Conceptually, something similar to rsync could be used to handle this.
Recommend
More recommend