Passive NFS Tracing of Email and Research Workloads Daniel Ellard, - PowerPoint PPT Presentation

Passive NFS Tracing of Email and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST 2003 - April 1, 2003

Talk Outline • Motivation • Tracing Methodology • Trace Summary • New Findings • Conclusion 4/1/2003 Daniel Ellard - FAST 2003 2

Motivation - Why Gather Traces? • Our research agenda: build file systems that – Tune themselves for their workloads – Can adapt to diverse workloads • Underlying assumptions: – There is a significant variation between workloads. – There are workload-specific optimizations that we can apply on-the fly. • We must test whether these assumptions hold for contemporary workloads. 4/1/2003 Daniel Ellard - FAST 2003 3

Why Use Passive NFS Traces? • Passive: no changes to the server or client – Sniff packets from the network – Non-invasive trace methods are necessary for real-world data collection • NFS is ubiquitous and important – Many workloads to trace – Analysis is useful to real users • Captures exactly what the server sees – Matches our research needs 4/1/2003 Daniel Ellard - FAST 2003 4

Difficulties of Analyzing NFS Traces • Underlying file system details are hidden – Disk activity – File layout • The NFS interface is different from a native file system interface – No open/close, no seek – Client-side caching can skew the operation mix • Some NFS calls and responses are lost • NFS calls may arrive out-of-order 4/1/2003 Daniel Ellard - FAST 2003 5

Our Tracing Software • Based on tcpdump and libpcap (a packet- capture library) – Captures more information than tcpdump – Handles RPC over TCP and jumbo frames • Anonymizes the traces – Very important for real-world data collection – Tunable to remove/preserve specific information • Open source, freely available 4/1/2003 Daniel Ellard - FAST 2003 6

Overview of the Traced Systems CAMPUS EECS • Central college facility • EE/CS facility • Almost entirely email: • No email SMTP, POP/IMAP, pine • Research: software • No R&D projects, experiments • “Normal” users • Research users • Digital UNIX • Network Appliances filer • 53G of storage (1 of 14 • 450G of storage home directory disks) 4/1/2003 Daniel Ellard - FAST 2003 7

Summary of Average Daily Activity 10/21/2001 - 10/27/2001 CAMPUS EECS Total Ops 26.7 Million 4.4 Million Read Ops 65% (119.6 GB) 10% (5.1 GB) Write Ops 21% (44.6 GB) 15% (9.1 GB) Other 14% 75% - getattr, lookup, access R/W Ops 3.01 0.69 4/1/2003 Daniel Ellard - FAST 2003 8

Workload Characteristics CAMPUS EECS • Data-Oriented • Metadata-Oriented • 95%+ of reads/writes • Mix of applications, mix are to large mailboxes of file sizes • For newly created files: • For newly created files: – 96%+ are zero-length – 5% are zero-length – Most of the remainder – Less than half of the are < 16k remainder are < 16k – < 1% are “write-only” – 57% are “write-only” 4/1/2003 Daniel Ellard - FAST 2003 9

How File Data Blocks Die • CAMPUS: – 99.1% of the blocks die by overwriting – Most blocks live in “immortal” mailboxes • EECS: – 42.4% of the blocks die by overwriting – 51.8% die because their file is deleted • Overwriting is common, and a potential opportunity to relocate/reorganize blocks on disk 4/1/2003 Daniel Ellard - FAST 2003 10

File Data Block Life Expectancy • CAMPUS: – More than 50% live longer than 15 minutes • EECS: – Less than 50% live longer than 1 second – Of the rest, only 50% live longer than two minutes • Most blocks die in the cache on EECS, but on CAMPUS blocks are more likely to die on disk 4/1/2003 Daniel Ellard - FAST 2003 11

Talk Outline • Motivation • Tracing Methodology • Trace Summary • New Findings • Conclusion 4/1/2003 Daniel Ellard - FAST 2003 12

Variation of Load Over Time • EECS: load has detectable patterns • CAMPUS: load is quite predictable – Busiest 9am-6pm and evenings Monday - Friday – Quiet in the late night / early morning • Each system has idle times, which could be used for file system tuning or reorganization. • Analyses of workload must include time. 4/1/2003 Daniel Ellard - FAST 2003 13

The Daily Rhythm of CAMPUS 4/1/2003 Daniel Ellard - FAST 2003 14

New Finding: File Names Predict File Properties • For most files, there is a strong relationship between the file name and its properties – Many filenames are chosen by applications – Applications are predictable • The filename suffix is useful by itself, but the entire name is better • The relationships between filenames and file properties vary from system to another 4/1/2003 Daniel Ellard - FAST 2003 15

Name-Based Hints for CAMPUS • Files named “inbox” are large, live forever, are overwritten frequently, and read sequentially. • Files with names starting with “inbox.lock” or ending with the name of the client host are zero-length lock files and live for a fraction of a second. • Files with names starting with # are temporary composer files. They always contain data, but are usually short and are deleted after a few minutes. • Dot files are read-only, except .history. 4/1/2003 Daniel Ellard - FAST 2003 16

Name-Based Hints for EECS • On EECS the patterns are harder to see. • We wrote a program to detect relationships between file names and properties, and make predictions based upon them – Developed this tool on CAMPUS and EECS data – Successful on other later traces as well • We can automatically build a model to accurately predict important attributes of a file based on its name. 4/1/2003 Daniel Ellard - FAST 2003 17

Accuracy of the Models Accuracy of predictions for EECS, for the model trained on 10/22/2001 for the trace from 10/23/2001. Prediction Accuracy Length = 0 99.4% Length < 16K 91.8% 4/1/2003 Daniel Ellard - FAST 2003 18

Accuracy of the Models Accuracy of predictions for EECS, for the model trained on 10/22/2001 for the trace from 10/23/2001. Prediction % of Accuracy Accuracy files w/o names Length = 0 99.4% 12.5% 87.5% Length < 16K 91.8% 35.2% 64.8% 4/1/2003 Daniel Ellard - FAST 2003 19

Accuracy of the Models Accuracy of predictions for EECS, for the model trained on 10/22/2001 for the trace from 10/23/2001. Prediction % of files Accuracy % Error Accuracy w/o names Reduction Length = 0 99.4% 12.5% 87.5% 94.9% Length < 16K 91.8% 35.2% 64.8% 76.6% 4/1/2003 Daniel Ellard - FAST 2003 20

New Finding: Out-of-Order Requests • On busy networks, requests can be delivered to the server in a different order than they were generated by the client – nfsiods can re-order requests – Network effects can also contribute • This can break fragile read-ahead heuristics on the server • We investigated this for FreeBSD and found that read-ahead was affected 4/1/2003 Daniel Ellard - FAST 2003 21

Conclusions • Workloads do vary, sometimes enormously • New traces are valuable – We gain new insights from almost every trace • We have identified several possible areas for future research: – Name-based file system heuristics – Handling out-of-order requests 4/1/2003 Daniel Ellard - FAST 2003 22

The Last Word Please contact me if you are interested in exchanging traces or using our tracing software or anonymizer: http://www.eecs.harvard.edu/sos ellard@eecs.harvard.edu Another resource: www.snia.org 4/1/2003 Daniel Ellard - FAST 2003 23

Passive NFS Tracing of Email and Research Workloads Daniel Ellard, - PowerPoint PPT Presentation

Passive NFS Tracing of Email and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST 2003 - April 1, 2003 Talk Outline Motivation Tracing Methodology Trace Summary New Findings Conclusion

CS416 Filesystem (NFS) NFS NFS allows a system to access files over a network One of

Network File System - NFS NFS Specification NFS is a distributed file system (DFS) originally

Passive Gas System Design PRESENTED BY BRYAN WELDON P.E. Passive System Overview 01 Passive

Petal and Frangipani Petal and Frangipani Petal/Frangipani Petal/Frangipani NFS NFS NAS

Linux Support of NFS v4.1 and v4.2 Steve Dickson steved@redhat.com Mar Thu 23, 2017 1 Agenda

NFS MIB Venkat Rangan Rhapsody Networks venkat@rhapsodynetworks.com IETF50: 3/19/01 NFS MIB

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Optimizations to NFS LA Patrick Stach NFS Linear Algebra Solve for a vector x such that:

NFS Version 4 Workgroup Directions Remaining Work NFS Version 4 Protocol Proposed

Distributed File Systems Chi Zhang czhang@cs.fiu.edu NFS Architecture (1) a) The remote access

1/29/2016 Introduction Introduction: NFS Appliance File System Design for an NFS File In

NP04 DAQ Computing Geoff Savage protoDUNE Single Phase (NP04) 21-Aug-2017 NFS NFS

Passive Fire Protection For the Oil & Gas Industry Passive Fire Protection What is purpose

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

HUD Housing Counseling Program FHA Connection Application Process to Become a HUD Certified

New LEA Data Managers Training Office of the Chief Information Officer 2020-21 School Year

Domain-Specific Corpora Many Document Features Grammatical Text Astro Teller is the CEO and

Changes to the 20192020 Certificate Of Eligibility FL ID&R Office June, 2018

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

Web Security: XSS; Sessions CS 161: Computer Security Prof. Raluca Ada Popa Nov 8, 2016

Cross-Site Request Forgeries (CSRF) & Path Traversal Professor Larry Heimann Web Application

Pattern matching algorithms Vineet Bafna April 23, 2004 1 Algorithms for keyword search

Passive NFS Tracing of Email and Research Workloads Daniel Ellard, - PowerPoint PPT Presentation

Passive NFS Tracing of Email and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST 2003 - April 1, 2003 Talk Outline Motivation Tracing Methodology Trace Summary New Findings Conclusion

CS416 Filesystem (NFS) NFS NFS allows a system to access files over a network One of

Network File System - NFS NFS Specification NFS is a distributed file system (DFS) originally

Passive Gas System Design PRESENTED BY BRYAN WELDON P.E. Passive System Overview 01 Passive

Petal and Frangipani Petal and Frangipani Petal/Frangipani Petal/Frangipani NFS NFS NAS

Linux Support of NFS v4.1 and v4.2 Steve Dickson steved@redhat.com Mar Thu 23, 2017 1 Agenda

NFS MIB Venkat Rangan Rhapsody Networks venkat@rhapsodynetworks.com IETF50: 3/19/01 NFS MIB

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Optimizations to NFS LA Patrick Stach NFS Linear Algebra Solve for a vector x such that:

NFS Version 4 Workgroup Directions Remaining Work NFS Version 4 Protocol Proposed

Distributed File Systems Chi Zhang czhang@cs.fiu.edu NFS Architecture (1) a) The remote access

1/29/2016 Introduction Introduction: NFS Appliance File System Design for an NFS File In

NP04 DAQ Computing Geoff Savage protoDUNE Single Phase (NP04) 21-Aug-2017 NFS NFS

Passive Fire Protection For the Oil &amp; Gas Industry Passive Fire Protection What is purpose

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

HUD Housing Counseling Program FHA Connection Application Process to Become a HUD Certified

New LEA Data Managers Training Office of the Chief Information Officer 2020-21 School Year

Domain-Specific Corpora Many Document Features Grammatical Text Astro Teller is the CEO and

Changes to the 20192020 Certificate Of Eligibility FL ID&amp;R Office June, 2018

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

Web Security: XSS; Sessions CS 161: Computer Security Prof. Raluca Ada Popa Nov 8, 2016

Cross-Site Request Forgeries (CSRF) &amp; Path Traversal Professor Larry Heimann Web Application

Pattern matching algorithms Vineet Bafna April 23, 2004 1 Algorithms for keyword search

Passive Fire Protection For the Oil & Gas Industry Passive Fire Protection What is purpose

Changes to the 20192020 Certificate Of Eligibility FL ID&R Office June, 2018

Cross-Site Request Forgeries (CSRF) & Path Traversal Professor Larry Heimann Web Application