Hope for the Best, Expect the Worst or what happens when E[ f(good - PowerPoint PPT Presentation

Hope for the Best, Expect the Worst or what happens when E[ f(good event) ] > E[ f(bad event) ] Lukas Kroc October 12, 2006 1

Outline ● Overview of file systems ● The basic idea: speculation ● Applying the idea to file systems: – local file systems – distributed file systems ● Implementation issues – performance results ● Conclusion 2

File Systems: What They Are ● allow and organize access to data – operations: create, write, read, delete ● physical scenarios: – local file systems – distributed file systems ● goal: – provide durability and performance given physical limitations (latency, bandwidth) ● consistency added for distributed systems 3

File Systems: How They Work stolen from Paul Francis' CS414 lecture notes 4

File Systems: How They Work stolen from Paul Francis' CS414 lecture notes 5

Main Issues ● a trade-off between durability and performance – durability calls for immediate access to the medium ● synchronous access – performance calls for caching ● asynchronous access ● file system speedups: – local: use memory cache and disk buffer to delay access – distributed: cache fetched files on clients 6

Papers for Discussion ● Nightingale et al: Speculative Execution in a Distributed File System (SOSP'05) – new way of dealing with issues of distributed file system ● Nightingale et al: Rethink the Sync (OSDI'06) – applies ideas from above to issues of local file systems ● same basic idea, different scenarios – will reverse the order of presentation, easier first 7

Basic Idea “Expect the best, be prepared for the worst” ● best = no power failure, cached data is valid ● worst = power fails, cached data is invalid ● prepared = able to recover a consistent state after a bad event happened ● expect = speculate that it will happen 8

Conditions for the Basic Idea to Work ● highly predictable results of speculations – crash will most likely not occur in the next 5 seconds – data in the cache is most likely valid ● computers have spare CPU cycles – to perform “free” speculative computation ● local overhead is lower than remote I/O 9

Local File Systems: Traditional Approach (ext3) ● i-node based ● added journaling for increased durability – meta-data only for performance reasons ● 2 modes of operation: stolen from Paul Francis' CS414 lecture notes – synchronous: system call return only after done – asynchronous: system call returns immediately 11

Problems of Traditional Approach ● synchronous mode: – durable (but only if using write barriers, or with disk buffer disabled), but very slow ● asynchronous mode: – not durable, but fast 12

Local File Systems: New Approach ● shift of paradigm: don't promise anything to the application, promise it to the user – the promise = synchronous guarantees – the user = any external entity observing the process ⇒ external synchrony – asynchronous internal workings, synchronous external guarantees – combines performance and durability benefits of both 13

External Synchrony ● Idea: – speculate that everything will be properly written to disk ● Overview: – immediately return from write call (asynchrony) – buffer all external output of the application until the write successfully happens – if write fails, discard the buffers ● Result: – better guarantees AND performance than ext3 14

External Synchrony: Schema 15

External Synchrony: Performance 16

Distributed File Systems: Traditional Approach (NFS) ● client-server approach ● synchronous I/O operations required for coherence – using RPC ● offers close-to-open consistency – weaker than local file systems 17

Problems of Traditional Approach ● at least 2 round-trip- times required per close – very slow ● close-to-open consistency isn't very good – for how slow it is 18

Distributed File Systems: New Approach ● Idea: – speculate that close is successful, that a cached data is valid.... ● Overview: – use asynchronous RPCs, immediately returning – checkpoint the application (store its state) and buffer all subsequent output – on success: output buffers, on failure: roll-back ● Result: – better guarantee AND performance than NFS 19

Speculative NFS: Schema 20

Speculative NFS: Performance 21

Overview of the Technique Speculate on... power failure not occurring, cache being valid ...by means of... buffering externalized output, checkpointing the process ...in order to... improve performance, increase consistency 22

Implementation: Buffering Externalized Output ● any kernel object with commit dependencies is uncommitted – any process that accesses uncommitted object is marked uncommitted, and vice versa – any external output of such process is buffered by kernel – logs are used to track dependencies ● once commit dependencies are removed, the buffers are output to external devices – also allows to group commits 24

Buffering Externalized Output (1) 25

Buffering Externalized Output (2) 26

Result: xsyncfs ● adapted ext3 file system to use external synchrony – internally works asynchronously, but looks synchronous ● commits journal transaction when: – journal space exhausted, journal old.... – user calls fsync() – output-triggered by buffered output ● adapts for throughput/latency optimization 27

xsyncfs: Performance PostMark benchmark Apache build 28

Implementation: Checkpointing a Process ● checkpoint: a state-image of a process – copy-on-write fork of the process – not placed on the run queue ● output of the running processed buffered while the process is speculative (with a checkpoint) ● depending on the result of the speculation: – success: the checkpoint is discarded – failed: process terminated and checkpoint assumes its identity and placed on the run queue 29

Propagating Causal Dependencies 30

Result: SpecNFS ● preserves existing NFS semantics – including close-to-open consistency ● offers much better performance than NFS ● implemented using the same RPCs – but in an asynchronous, speculative manner ● follows the external-synchrony paradigm – what is observed has been committed 31

Result: BlueFS ● strong consistency and safety guarantees – single-copy file semantics (shared local disk) ● still good performance – still outperforms NFS ● prior to read/write, cached versions are speculated to be valid – in case of access conflict, roll-back occurs 32

SpecNFS & BlueFS: Performance PostMark benchmark Apache build 33

SpecNFS & BlueFS: Performance Apache build 34

Conclusions ● Concept of speculation/roll-back introduced – known in fault tolerance research already – applicable to general I/O issues – “Expecting the best, being prepared for the worst” ● Might help resolve the tension between performance and durability in file systems – not “proven by time” yet, but looks good ● The idea is applicable in a broader context – distributed simulations, processor cache warm-up 35

Hope for the Best, Expect the Worst or what happens when E[ f(good - PowerPoint PPT Presentation

Hope for the Best, Expect the Worst or what happens when E[ f(good event) ] > E[ f(bad event) ] Lukas Kroc October 12, 2006 1 Outline Overview of file systems The basic idea: speculation Applying the idea to file systems:

NEW HOPE FARM Title New Hope slides.jpg Plantation Services NEW HOPE FARM Raod.jpg Plantation

Using Best-Worst Using Best-Worst Scaling to measure all Scaling to measure all sorts of things

Information Geometry in Mathematical Finance: Model Risk, Worst and Almost Worst Scenarios Imre

Team Hope 31 st Annual HDSA Convention Presented By: CJ Redfern, Team Hope Walks Manager Peggy

Working with hope ? Definitions/descriptions of Hope Background research on hope Before

About Hope City Hope City is a 501(c)(3) non-profit organization that operates as a church

Verse in Quite literally we did not have any real hope before our salvation. ProPresenter Hope

PROPHETIC HOPE Session 4 Message of the prophets Pre-exilic - WARNING, hope Exilic

Litigation Advice for the Land Use Practitioner Hope for the Best, Prepare for the Worst

Leading in Crisis: The Best of Times, The Worst of Times Dr. Kevin Nourse Leap Advocates

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

Journey of Hope TN BEST Peer Support Groups 2 1 Journey of Hope TN Carolyn Scott, Executive

The 10 Worst Presentation Habits Speakers can be their own worst enemies. Here are our expert's

Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, Standard Microsystems (SMSC)

Comparison of Efficiency Binary Binomial Procedure (worst- (worst- (amortized) case) case)

Typical versus Worst Case Design in Networking Nandita Dukkipati Yashar Ganjali, Rui Zhang-Shen

History of Stats Porn A brief Andrew Godwin @andrewgodwin c. 950 AD Nicole Oresme (1323 -

Information Visualization Intro, Time Series Exercise Tamara Munzner Department of Computer

Learning to Visualize: Surviving in the World of Data Nam Wook Kim Mini-Courses January @

CS 4400 / 5400 Programming Languages [Introduction, Overview, Intro to Haskell] Ferdinand Vesely

ANNUAL GENERAL MEETING FY2019 Moment Silence for JOHN NIGHTINGALE Founder and Managing Director

Static enforceability of XPath-based access control policies James Cheney University of

Working Group on Working Group on BI IO OS SE EC CU UR RI IT TY Y Y B Communication

Project Objectives: Develop advice to OMB on ways to improve organizational health and