NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer - PowerPoint PPT Presentation

NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer FREENIX 2003 - June 12, 2003

Outline • Motivation – Research questions – Benchmarking traps • New NFS Read-Ahead Heuristics – Optimize sequential reads – Improve non-sequential reads • Results • Conclusions June 12, 2003 Daniel Ellard - Freenix 2003 2

Goal - Improve NFS Read Throughput • We are interested in improving the throughput of data accessed from disk via NFS. – Example: email workload • Our approach: improve the heuristics that control the amount of read-ahead done by the server. June 12, 2003 Daniel Ellard - Freenix 2003 3

Why Improve Read-Ahead Heuristics? • With busy NFS clients, 5-10% of NFS requests arrive at the server out-of-order. • nfsiods are the primary source of reordering. – nfsiod is a client daemon that marshals and schedules NFS requests. – Many implementations use multiple nfsiods. – Contention for resources and process scheduling effects can cause reordering. June 12, 2003 Daniel Ellard - Freenix 2003 4

Why Improve Read-Ahead Heuristics? • Sequential access patterns may appear non- sequential if requests are reordered. • Servers do less (or no) read-ahead for non- sequential access patterns. • Read-ahead is necessary for good performance. June 12, 2003 Daniel Ellard - Freenix 2003 5

Research Questions Can we improve performance for sequential • reads by improving the way the NFS sequentiality-detection heuristic handles “slightly” out-of-order requests? Can we detect non-sequential access • patterns that have sequential components and therefore can benefit from read-ahead? June 12, 2003 Daniel Ellard - Freenix 2003 6

A Micro-Benchmark for NFS Reads • Long sequential reads • Many concurrent readers • Inspired by observed email workloads • All tests begin with a cold cache on client and server. – All data is brought from disk during the benchmark. June 12, 2003 Daniel Ellard - Freenix 2003 7

The Testbed • FreeBSD 4.6.2 • Commodity PCs – Note: PCI bus transfer speed of 54 MB/s • Intel PRO/1000 TX gigabit Ethernet – em device driver – MTU=1500 – Raw TCP transfer rate of 49 MB/s • IDE and SCSI drives – Paper discusses SCSI, this talk focuses on IDE June 12, 2003 Daniel Ellard - Freenix 2003 8

Preliminary Results • Before measuring the effect of our changes to the NFS server, we must understand the default system. • Results of our benchmarks were frustrating: – Large variance – Strange effects • We decided to investigate these effects before proceeding. June 12, 2003 Daniel Ellard - Freenix 2003 9

Benchmarking Traps • Properties of disks and their drivers: – ZCAV/disk geometry effects – Disk scheduling algorithms – Tagged command queues • Arbitrary limits in the NFS implementation • Network issues – TCP vs UDP for RPC June 12, 2003 Daniel Ellard - Freenix 2003 10

ZCAV Effects • ZCAV - “Zoned Constant Angular Velocity” – Disk tracks are grouped into zones. – Within each zone, each track has the same number of sectors. – The number of sectors is roughly proportional to the length of the track. • Tracks in the outer zones hold 1.2 - 2 times more data – Outer zone has a higher transfer rate – Outer zone requires fewer seeks June 12, 2003 Daniel Ellard - Freenix 2003 11

The ZCAV Effect - Local IDE Disk 50 45 40 35 30 MB/s 25 20 15 10 5 0 1 2 4 8 16 32 Number of Concurrent Readers Outermost Zones Inner Zones June 12, 2003 Daniel Ellard - Freenix 2003 12

Controlling for ZCAV Effects • To minimize the ZCAV effect, minimize the difference between the innermost and outermost zones you use. – Use a large disk. – Run your benchmark in a small partition. • To measure the effect, create several partitions and repeat your benchmark in each. June 12, 2003 Daniel Ellard - Freenix 2003 13

Disk Scheduler Issues • BSD systems use the CSCAN scheduler. • CSCAN trades fairness for disk utilitization. – Some requests are serviced much sooner than others. – It is not hard to create request streams that starve other requests for the disk. – Overall throughput is very good. • Many scheduling algorithms are unfair. June 12, 2003 Daniel Ellard - Freenix 2003 14

Controlling for Scheduler Effects • Application specific! • For our purposes: – Total throughput for concurrent readers – Measure the total time it takes for all the concurrent readers to finish their tasks, instead of the time of each individual reader. • There is large variation in the time each reader takes, but the time required by the slowest reader is reasonably consistent. June 12, 2003 Daniel Ellard - Freenix 2003 15

Tagged Command Queues • SCSI drives have tagged command queues. – Disk requests are sent to the drive as soon as they reach the front of the scheduler queue. – The drive schedules the requests according to its own scheduling algorithm. • For our benchmarks and hardware: – Tagged command queues increase fairness. – Unfortunately, throughput is reduced (almost 50% in the worst case). June 12, 2003 Daniel Ellard - Freenix 2003 16

Back to the Experiments… Q: What is the potential for improvement in the read-ahead algorithm? – Compare the default system to AlwaysReadAhead, a system that aggressively always does as much read-ahead as it can. A: There is benefit when the degree of concurrency is high and requests arrive out- of-order. June 12, 2003 Daniel Ellard - Freenix 2003 17

NFS Read Throughput (Busy Clients) 20 15 MB/s 10 5 0 1 2 4 8 16 32 Number of Concurrent Readers AlwaysReadAhead Default June 12, 2003 Daniel Ellard - Freenix 2003 18

The SlowDown Heuristic Default Heuristic SlowDown Heuristic If the access is sequential If the access is sequential relative to the previous relative to the previous access: access: seqCount++ seqCount++ else else if the access is “close” to the previous access: seqCount = small const seqCount is unchanged else seqCount = seqCount / 2 June 12, 2003 Daniel Ellard - Freenix 2003 19

The Effect of SlowDown 20 15 MB/s 10 5 0 1 2 4 8 16 32 Number of Concurrent Readers AlwaysReadAhead Default SlowDown June 12, 2003 Daniel Ellard - Freenix 2003 20

Why Doesn’t SlowDown Help? The problem is not SlowDown. • In FreeBSD, the sequentiality scores are stored in a fixed-size hash table. • When the table is full, adding a new entry forces the ejection of another. • The hash table is too small to support more than a few readers. June 12, 2003 Daniel Ellard - Freenix 2003 21

SlowDown with the Larger Table 20 15 MB/s 10 5 0 1 2 4 8 16 32 Number of Concurrent Readers AlwaysReadAhead Default SlowDown + New Table June 12, 2003 Daniel Ellard - Freenix 2003 22

The Effect of Increasing the Table Size • Increasing the hash table size makes SlowDown as fast as AlwaysReadAhead. • Fixing the table also makes the default algorithm as fast as AlwaysReadAhead. – For our current testbed, it is enough simply to have a reasonable value for seqCount. – Perhaps in the future having a more accurate value will become important. June 12, 2003 Daniel Ellard - Freenix 2003 23

Improving Non-Sequential Reads • Some read patterns are non-sequential, but do contain sequential components. • One example is two threads reading sequentially from the same file: – Thread 1 reads blocks 0, 1, 2, 3, 4 … – Thread 2 reads blocks 1000, 1001, 1002, 1003 … – Server sees 0, 1000, 1, 1001, 2, 1002, 3, 1003 … • This pattern is not sequential according to the default or SlowDown read-ahead heuristics. June 12, 2003 Daniel Ellard - Freenix 2003 24

Using Cursors to Find Components • For each active file, maintain a set of cursors. – Each cursor is a position and sequentiality score. • For each read access to the file, choose the cursor with the closest position: – If there is no “close” cursor, create one. – If there are already too many cursors for this file, eject the least recently used. – Update the sequentiality score for the cursor. June 12, 2003 Daniel Ellard - Freenix 2003 25

The Effect of Cursors 16 14 12 10 MB/s 8 6 4 2 0 2 4 8 Number of Concurrent Threads Using Cursors Default Read-Ahead June 12, 2003 Daniel Ellard - Freenix 2003 26

Conclusions • The SlowDown heuristic does not help much, at least not for our system. – Fixing the hash table does help • Cursors work well for access patterns that are the composition of sequential access patterns. • Benchmarking is hard, even for simple changes. June 12, 2003 Daniel Ellard - Freenix 2003 27

Obtaining Our Code Daniel Ellard ellard@eecs.harvard.edu http://www.eecs.harvard.edu/~ellard/NFS June 12, 2003 Daniel Ellard - Freenix 2003 28

NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer - PowerPoint PPT Presentation

NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer FREENIX 2003 - June 12, 2003 Outline Motivation Research questions Benchmarking traps New NFS Read-Ahead Heuristics Optimize sequential reads Improve

TRICKS AND TRAPS OF THE CO-OPS ACT TRICKS AND TRAPS OF THE CO-OPS ACT THE NAME TRAP a

CS416 Filesystem (NFS) NFS NFS allows a system to access files over a network One of

Network File System - NFS NFS Specification NFS is a distributed file system (DFS) originally

Petal and Frangipani Petal and Frangipani Petal/Frangipani Petal/Frangipani NFS NFS NAS

Linux Support of NFS v4.1 and v4.2 Steve Dickson steved@redhat.com Mar Thu 23, 2017 1 Agenda

NFS MIB Venkat Rangan Rhapsody Networks venkat@rhapsodynetworks.com IETF50: 3/19/01 NFS MIB

Optimizations to NFS LA Patrick Stach NFS Linear Algebra Solve for a vector x such that:

NFS Version 4 Workgroup Directions Remaining Work NFS Version 4 Protocol Proposed

Distributed File Systems Chi Zhang czhang@cs.fiu.edu NFS Architecture (1) a) The remote access

1/29/2016 Introduction Introduction: NFS Appliance File System Design for an NFS File In

NP04 DAQ Computing Geoff Savage protoDUNE Single Phase (NP04) 21-Aug-2017 NFS NFS

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Traps and Faults Traps and Faults Review: Mode and Space Review: Mode and Space C A B data

Avoiding Sand Traps and Moguls: Avoiding Sand Traps and Moguls: A Refresher Course for In A

The History of the Inner Solar System According to the Lunar Cold Traps D. H. Crider Catholic

Database Storage Part I Lecture # 03 Database Systems Andy Pavlo AP AP Computer Science

Sequential team form and its simplification using graphical models Aditya Mahajan and Sekhar

ALIGNMENT-FREE SEQUENCE COMPARISON OVER HADOOP FOR COMPUTATIONAL BIOLOGY Giuseppe Cattaneo,

= Set Reset 0 S 0 R Q Q Q Q Sequential Logic 3 Sequential Logic 4 SR latch D latch

The I/O-Model Aggarwal and Vitter, The Input/Output Complexity of Sorting and Related Problems

Understanding CPU Caches Ulrich Drepper Introduction Discrepancy main CPU and main memory speed

Estimating Risk under Estimating statistics . . . Linearized techniques Interval Uncertainty:

Induction and Its Applications Part 1: Algorithm Correctness, Loop Invariants, and Induction