1/29/2016 Introduction Introduction: NFS Appliance File System Design for an NFS File • In general, appliance is device designed to • NFS File Server Appliances have different Server Appliance perform specific function requirements than those of general purpose • Distributed systems trend has been to use file system Dave Hitz, James Lau, and Michael Malcolm appliances instead of general purpose computers. – NFS access patterns are different than local file Examples: Technical Report TR3002 access patterns – routers from Cisco and Avici NetApp – Large client-side caches result in fewer reads than – network terminals 2002 writes – network printers • Network Appliance Corporation uses Write • For files, not just another computer with your files, but new type of network appliance http://www.netapp.com/us/library/white-papers/wp_3002.html Anywhere File Layout (WAFL) file system (At WPI: http://www.wpi.edu/Academics/CCC/Help/Unix/snapshots.html) Network File System (NFS) file server Introduction: WAFL WPI File System Outline • CCC machines have central, Network File System • WAFL has 4 requirements • Introduction (done) (NSF) – Fast NFS service • Snapshots : User Level (next) – Have same home directory for cccwork2 , – Support large file systems (10s of GB) that can grow (can add cccwork3 … • WAFL Implementation disks later) – /home has 10,113 directories! – Provide high performance writes and support Redundant • Snapshots: System Level • Previously, Network File System support from Arrays of Inexpensive Disks (RAID) NetApp WAFL • Performance – Restart quickly, even after unclean shutdown • Switched to EMC Celera NS-120 • Conclusions • NFS and RAID both strain write performance: similar features and protocol support – NFS server must respond after data is written • Provide notion of “snapshot” of file system (next) – RAID must write parity bits also 1
1/29/2016 Introduction to Snapshots User Access to Snapshots Snapshot Administration • • Snapshots are copy of file system at given point in time • Example , suppose accidentally removed file named “ todo ”: WAFL server allows sys admins claypool 168 CCCWORK3% cd .snapshot to create and delete claypool 169 CCCWORK3% ls -1 • WAFL creates and deletes snapshots automatically at preset snapshots, but usually home-20160121-00:00/ CCCWORK3% ls -lut .snapshot/*/todo times home-20160122-00:00/ -rw-rw---- 1 claypool claypool 4319 Oct 24 18:42 automatic home-20160122-22:00/ .snapshot/2011_10_26_18.15.29/todo – Up to 255 snapshots stored at once • At WPI, snapshots of /home . home-20160123-00:00/ -rw-rw---- 1 claypool claypool 4319 Oct 24 18:42 Says: • Uses copy-on-write to avoid duplicating blocks in the active home-20160123-02:00/ .snapshot/2011_10_26_19.27.40/todo – 3am, 6am, 9am, noon, 3pm, home-20160123-04:00/ -rw-rw---- 1 claypool claypool 4319 Oct 24 18:42 file system 6pm, 9pm, midnight home-20160123-06:00/ .snapshot/2011_10_26_19.37.10/todo • Snapshot uses: – Nightly snapshot at midnight home-20160123-08:00/ • every day home-20160123-10:00/ Can then recover most recent version: – Users can recover accidentally deleted files – Weekly snapshot is made on home-20160123-12:00/ – Sys admins can create backups from running system … Saturday at midnight every CCCWORK3% cp .snapshot/2011_10_26_19.37.10/todo todo home-20160127-16:00/ week – System can restart quickly after unclean shutdown home-20160127-17:00/ But looks like every 1 hour home-20160127-18:00/ • Roll back to previous snapshot • Note, snapshot directories ( .snapshot ) are hidden in that they (fewer copies kept for older home-20160127-19:00/ periods and 1 week ago max) don’t show up with ls (even ls -a ) unless specifically requested home-20160127-20:00/ home-latest/ Snapshots at WPI (Windows) Outline WAFL File Descriptors • Mount UNIX space ( \\storage.wpi.edu\home ), add \.snapshot to end • Introduction • Inode based system with 4 KB blocks (done) • Inode has 16 pointers, which vary in type depending upon file • Snapshots : User Level (done) size • WAFL Implementation (next) – For files smaller than 64 KB: • Each pointer points to data block • Snapshots: System Level – For files larger than 64 KB: • Each pointer points to indirect block • Performance – For really large files: • Conclusions • Each pointer points to doubly-indirect block • For very small files (less than 64 bytes), data kept in inode itself, instead of using pointers to blocks • Can also right-click on file and Note, files in .snapshot choose “restore previous version” do not count against quota 2
1/29/2016 Zoom of WAFL Meta-Data Snapshots (1 of 2) WAFL Meta-Data (Tree of Blocks) • Copy root inode only, copy on write for changed data blocks • Root inode must be in fixed location • Meta-data stored in files • Other blocks can be written anywhere – Inode file – stores inodes – Block-map file – stores free blocks – Inode-map file – identifies free inodes • Over time, old snapshot references more and more data blocks that are not used • Rate of file change determines how many snapshots can be stored on system Snapshots (2 of 2) Consistency Points (1 of 2) Consistency Points (2 of 2) • When disk block modified, must modify • WAFL uses NVRAM (NV = Non-Volatile): meta-data (indirect pointers) as well • In order to avoid consistency checks after unclean – (NVRAM is DRAM with batteries to avoid losing during unexpected poweroff, some servers now just solid-state or shutdown, WAFL creates special snapshot called hybrid) – NFS requests are logged to NVRAM consistency point every few seconds – Upon unclean shutdown, re-apply NFS requests to last – Not accessible via NFS consistency point – Upon clean shutdown, create consistency point and turnoff • Batched operations are written to disk each NVRAM until needed (to save power/batteries) consistency point • Note, typical FS uses NVRAM for metadata write cache instead of just logs – Like journal – Uses more NVRAM space (WAFL logs are smaller) • In between consistency points, data only written • Ex: “rename” needs 32 KB, WAFL needs 150 bytes • Ex: write 8 KB needs 3 blocks (data, inode, indirect pointer), WAFL to RAM needs 1 block (data) plus 120 bytes for log – Slower response time for typical FS than for WAFL (although • Batch, to improve I/O performance WAFL may be a bit slower upon restart) 3
1/29/2016 The Block-Map File Write Allocation Outline • Typical FS uses bit for each free block, 1 is allocated and 0 is free – Ineffective for WAFL since may be other snapshots that point to • Write times dominate NFS performance • Introduction (done) block – Read caches at client are large • WAFL uses 32 bits for each block – Up to 5 x as many write operations as read operations at • Snapshots : User Level (done) – For each block, copy “active” bit over to snapshot bit server • WAFL batches write requests (e.g., at consistency • WAFL Implementation (done) points) • Snapshots: System Level • WAFL allows “write anywhere”, enabling inode next to (next) data for better perf • Performance – Typical FS has inode information and free blocks at fixed location • Conclusions • WAFL allows writes in any order since uses consistency points – Typical FS writes in fixed order to allow fsck to work if unclean shutdown Creating Snapshots Flushing IN_SNAPSHOT Data Outline • Could suspend NFS, create snapshot, resume NFS • Flush inode data first • Introduction (done) – But can take up to 1 second – Keeps two caches for inode data, so can copy system cache to • Challenge: avoid locking out NFS requests • Snapshots : User Level (done) inode data file, unblocking most NFS requests • Quick, since requires no I/O since inode file flushed later • WAFL marks all dirty cache data as IN_SNAPSHOT. • WAFL Implementation (done) • Update block-map file Then: – Copy active bit to snapshot bit • Snapshots: System Level (done) – NFS requests can read system data, write data not • Write all IN_SNAPSHOT data • Performance (next) IN_SNAPSHOT – Restart any blocked requests as soon as particular buffer flushed – Data not IN_SNAPSHOT not flushed to disk (don’t wait for all to be flushed) • Conclusions • Duplicate root inode and turn off IN_SNAPSHOT bit • Must flush IN_SNAPSHOT data as quickly as possible flush • All done in less than 1 second, first step done in 100s of ms IN_SNAPSHOT Can be used new 4
Recommend
More recommend