Interposed Routing Interposed Request Routing for Scalable Client sends and receives Network Storage *Server standard NFS packets. *Server Darrell Anderson, Jeff Chase, and Amin Vahdat NFS Client µ *Server Department of Computer Science Duke University *Server Slice µProxy intercepts and *Server redirects NFS packets to specialized servers. Duke University • Department of Computer Science Goals Outline Devise a highly scalable network storage architecture Interposed routing • Interpose on a standard file system protocol. Slice architecture – Prototype supports NFS version 3. • Functional decomposition • Distribute responsibilities and data. • Data decomposition – Divide functions (e.g., data vs. metadata). Functions – Scale functions by aggregating servers. • Block-I/O This talk: • Small-file • Request routing to scale functions. • Metadata Request routing Performance Duke University • Department of Computer Science Duke University • Department of Computer Science In the Beginning... Slice Architecture directory Client sends name servers routing and receives standard NFS name space packets. requests striping policy network bulk I/O client NFS Client Network NFS Server storage µproxy array small file Server sends read/write and receives file standard NFS placement small-file policy packets. servers Duke University • Department of Computer Science Duke University • Department of Computer Science 1
Functional Decomposition Block-I/O Storage Nodes directory Network storage nodes provide all storage in Slice. name servers routing • Prototype uses a simple object-based model. – Read, write, remove, truncate. name space requests • Clients access storage nodes directly. striping policy network bulk I/O – Static striping, or flexible block-maps. client storage µproxy – Optional RAID “10” mirrored striping. array small file striping read/write network policy client file storage µproxy bulk I/O array placement small-file policy servers Duke University • Department of Computer Science Duke University • Department of Computer Science Data Decomposition Small-File Servers directory Handle read and write operations on small files. name servers routing • All I/O requests below threshold (e.g., 64 KB). – Also the initial “small” segments of large files. name space requests • Absorb and aggregate I/O on small files. striping policy network bulk I/O – Data backed by storage array. client storage µproxy • Storage nodes need not handle small files well. array small file file placement read/write network policy client storage file µproxy small file placement array small-file read/write small-file servers policy servers Duke University • Department of Computer Science Duke University • Department of Computer Science Outline Directory Servers Interposed routing Handle name space operations. Slice architecture • Associate name with attributes (lookup, getattr). • Functional decomposition • Manage directory contents (create, readdir). • Data decomposition – Preserve dependencies between objects. Functions • Create affects new object and its parent directory. • Block-I/O Storage Nodes • Small-file Servers name routing • Directory Servers network policy client storage Request routing µproxy name space array requests directory servers Performance Duke University • Department of Computer Science Duke University • Department of Computer Science 2
Outline Outline Interposed routing Interposed routing Slice architecture Slice architecture • Functional decomposition • Functional decomposition • Data decomposition • Data decomposition Functions Functions • Block-I/O Storage Nodes • Block-I/O Storage Nodes • Small-file Servers • Small-file Servers • Directory Servers • Directory Servers Request routing Request routing Performance Performance Duke University • Department of Computer Science Duke University • Department of Computer Science Request Routing Goals Experiment Configuration Focus on name space. Hardware • Spread name space across multiple servers. • Client: 450 MHz P3 with 32 bit 33 MHz PCI. – Balance capacity and load. • Server: 733 MHz P3 with 64 bit 66 MHz PCI. • (Maybe) keep entries on same server as parent. • Server: 8x 18 GB Seagate Ultra-2 Cheetah disks. – Some name space ops involve multiple sites. • Gigabit Ethernet with 9 KB “jumbo” frames. • Create entry, update parent modify time. Software • FreeBSD 4.0-release. • Modified NFS stack and firmware for zero-copy. • NFS uses UDP/IP with 32 KB MTU. • Slice kernel modules; µProxy is IP filter on client. Duke University • Department of Computer Science Duke University • Department of Computer Science Request Routing Block-I/O Scaling Three policies for name space request routing: 70 500 Single-Client Bandwidth (MB/s) • Volume Partitioning: Aggregate Bandwidth (MB/s) 60 400 – Divide the name space into volumes. 50 read read – Volumes have well defined mount points. 300 40 write write • Mkdir Switching: mirror-read mirror-read 30 mirror-write 200 mirror-write – Items on same server as parent directory. 20 – Some mkdirs redirect to another server. 100 10 • Name Hashing: 0 0 – Name space is a distributed hash table. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Storage Nodes Storage Nodes – Requests hash by name, parent dir. Duke University • Department of Computer Science Duke University • Department of Computer Science 3
Name Space Scaling SPECsfs97 Latency 800 Average Latency (msec/op) 15 NFS Average Time (s) 600 N-UFS Slice-1 10 Slice-1 Slice-2 N-MFS 400 Slice-4 Slice-2 Slice-6 Slice-4 5 Slice-8 200 Slice-8 Celerra 506 0 0 0 5 10 15 20 25 0 1250 2500 3750 5000 6250 7500 Clients Delivered Load (IOPS) Duke University • Department of Computer Science Duke University • Department of Computer Science Mkdir Switching Affinity Summary Slice interposes between NFS client and server. 300 • Simple redirection of NFS version 3 packets. 250 Average Time (s) – Slice µProxy inspects and rewrites packets. 200 16 Clients • Separates functions normally for central server. 8 Clients 150 4 Clients – Functional decomposition for request stream. 1 Client 100 – Data decomposition to scale each function. 50 • Prototype shows performance and scalability. 0 0 20 40 60 80 100 http://www.cs.duke.edu/ari/slice Directory Affinity (%) Duke University • Department of Computer Science Duke University • Department of Computer Science SPECsfs97 Throughput EOF 8000 Delivered Load (IOPS) Slice-8 6000 Slice-6 Slice-4 4000 Slice-2 Slice-1 NFS 2000 Ideal 0 0 1250 2500 3750 5000 6250 7500 Offered Load (IOPS) Duke University • Department of Computer Science Duke University • Department of Computer Science 4
Handling Failures Approach: write-ahead logging. NFS Client • µProxy logs intentions for “dangerous” operations to 1. Request 5. Response coordinator. – Also logs when finished. µ 3. (do it) • Coordinator completes or aborts aging operations. 2. Danger! 4. Safe again – Roll forward, or back. • Independent of client, server, Coordinator and storage nodes. Duke University • Department of Computer Science 5
Recommend
More recommend