Cer$fying a Crash-safe File System Nickolai Zeldovich - PowerPoint PPT Presentation

Cer$fying a   Crash-safe File System Nickolai Zeldovich Collaborators: Tej Chajed, Haogang Chen, Alex Konradi, Stephanie Wang, Daniel Ziegler, Adam Chlipala, M. Frans Kaashoek

File systems should not lose data • People use file systems to store permanent data • Computers can crash any$me • power failures • hardware failures (unplug USB drive) • soRware bugs • File systems should not lose or corrupt data in case of crashes

File systems are complex and have bugs • Linux ext4: ~60,000 lines of code • Some bugs are serious: data loss, security exploits , etc. Cumula&ve number of bug patches in Linux file systems [Lu et al., FAST’13] 600 ext3 # of patches for bugs xfs 450 jfs reiserfs 300 ext4 btrfs 150 0 Dec-03 Apr-04 Dec-04 Jan-06 Feb-07 Apr-08 Jun-09 Aug-10 May-11

Researches in avoiding bugs in file systems • Most research is on finding bugs • Crash injec$on (e.g., EXPLODE [OSDI’06]) • Symbolic execu$on (e.g., EXE [Oakland’06]) • Design modeling (e.g., in Alloy [ABZ’08]) • Some elimina$on of bugs by proving: • FS without directories [Arkoudas et al. 2004] • BilbyFS [Keller 2014] • UBIFS [Ernst et al. 2013]

Researches in avoiding bugs in file systems • Most research is on finding bugs reduce   • Crash injec$on (e.g., EXPLODE [OSDI’06]) # of bugs • Symbolic execu$on (e.g., EXE [Oakland’06]) • Design modeling (e.g., in Alloy [ABZ’08]) • Some elimina$on of bugs by proving: • FS without directories [Arkoudas et al. 2004] • BilbyFS [Keller 2014] • UBIFS [Ernst et al. 2013]

Researches in avoiding bugs in file systems • Most research is on finding bugs reduce   • Crash injec$on (e.g., EXPLODE [OSDI’06]) # of bugs • Symbolic execu$on (e.g., EXE [Oakland’06]) • Design modeling (e.g., in Alloy [ABZ’08]) • Some elimina$on of bugs by proving: • FS without directories [Arkoudas et al. 2004] incomplete • BilbyFS [Keller 2014] + no crashes • UBIFS [Ernst et al. 2013]

Dealing with crashes is hard • Crashes expose many par$ally-updated states • Reasoning about all failure cases is hard • Performance op$miza$ons lead to more tricky par$al states • Disk I/O is expensive • Buffer updates in memory

Dealing with crashes is hard A patch for Linux’s write-ahead logging (jbd) in 2012: “Is it safe to omit a disk write barrier here?” commit 353b67d8ced4dc53281c88150ad295e24bc4b4c5 Author: Jan Kara <jack@suse.cz> Date: Sat Nov 26 00:35:39 2011 +0100 Title: jbd: Issue cache flush after checkpointing --- a/fs/jbd/checkpoint.c It's unlikely this will be necessary, … but we +++ b/fs/jbd/checkpoint.c @@ -504,7 +503,25 @@ int cleanup_journal_tail(journal_t *journal) need this to guarantee correctness. spin_unlock(&journal->j_state_lock); return 1; Fortunately this func;on doesn't get called all } + spin_unlock(&journal->j_state_lock); that o<en. + + /* + * We need to make sure that any blocks that were recently written out + * --- perhaps by log_do_checkpoint() --- are flushed out before we + * drop the transactions from the journal. It's unlikely this will be + * necessary, especially with an appropriately sized journal, but we + * need this to guarantee correctness. Fortunately + * cleanup_journal_tail() doesn't get called all that often. + */ + if (journal->j_flags & JFS_BARRIER) + blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL); + spin_lock(&journal->j_state_lock); + if (!tid_gt(first_tid, journal->j_tail_sequence)) { + spin_unlock(&journal->j_state_lock); + /* Someone else cleaned up journal so return 0 */ + return 0; + }

Goal: cer$fy a file system under crashes A complete file system with a machine-checkable proof that its implementa$on meets its specifica$on , both under normal execu@on and under any sequence of crashes, including crashes during recovery .

Contribu$ons • CHL : Crash Hoare Logic • Specifica$on framework for crash-safety of storage • Crash condi$on and recovery seman$cs • Automa$on to reduce proof effort • FSCQ : the first cer$fied crash-safe file system • Basic Unix-like file system (no hard-links, no concurrency) • Precise specifica$on for the core subset of POSIX • I/O performance on par with Linux ext4 • CPU overhead is high

FSCQ runs standard Unix programs FSCQ (wriNen in Coq) Crash Hoare Logic (CHL) Top-level specifica@on Internal specifica@ons Program Program Proof

FSCQ runs standard Unix programs FSCQ (wriNen in Coq) Crash Hoare Logic (CHL) Top-level specifica@on Internal specifica@ons Program Program Proof Coq proof checker OK

FSCQ runs standard Unix programs FSCQ (wriNen in Coq) Crash Hoare Logic (CHL) Top-level specifica@on Internal specifica@ons Program Mechanical   Proof code extrac$on Coq proof checker FSCQ’s Haskell code Haskell compiler OK FSCQ’s FUSE server

FSCQ runs standard Unix programs FSCQ (wriNen in Coq) Crash Hoare Logic (CHL) Top-level specifica@on Internal specifica@ons Program Mechanical   Proof code extrac$on Coq proof checker FSCQ’s Haskell code Haskell compiler OK FSCQ’s FUSE server Haskell libraries & FUSE driver Linux kernel /dev/sda

FSCQ runs standard Unix programs FSCQ (wriNen in Coq) Crash Hoare Logic (CHL) Top-level specifica@on Internal specifica@ons Program Mechanical   Proof code extrac$on Coq proof checker FSCQ’s Haskell code Haskell compiler OK FSCQ’s FUSE server $ mv src dest Haskell libraries $ git clone repo… & FUSE driver $ make disk read(),   syscalls FUSE upcalls write(), sync() Linux kernel /dev/sda

FSCQ’s Trusted Compu@ng Base FSCQ (wriNen in Coq) Crash Hoare Logic (CHL) Top-level specifica@on Internal specifica@ons Program Mechanical   Proof code extrac@on Coq proof checker FSCQ’s Haskell code Haskell compiler OK FSCQ’s FUSE server $ mv src dest Haskell libraries $ git clone repo… & FUSE driver $ make disk read(),   syscalls FUSE upcalls write(), sync() Linux kernel /dev/sda

Outline • Crash safety • What is the correct behavior aRer a crash? • Challenge 1: formalizing crashes • Crash Hoare Logic (CHL) • Challenge 2: incorpora$ng performance op$miza$ons • Disk sequences • Building a complete file system • Evalua$on

What is crash safety ? • What guarantee should file system provide when it crashes and reboot? • Look it up in the POSIX standard?

POSIX is vague about crash behavior [...] a power failure [...] can cause data to be lost. The data may be associated with a file that is s:ll open, with one that has been closed, with a directory, or with any other internal system data structures associated with permanent storage. This data can be lost, in whole or part, so that only careful inspec:on of file contents could determine that an update did not occur. IEEE Std 1003.1, 2013 Edi$on • POSIX’s goal was to specify “common-denominator” behavior • Gives freedom to file systems to implement their own op$miza$ons

What is crash safety ? • What guarantee should file system provide when it crashes and reboot? • Look it up in the POSIX standard? (Too Vague) • A simple and useful defini$on is transac@onal • Atomicity : every file-system call is all-or-nothing • Durability : every call persists on disk when it returns • Run every file-system call inside a transac$on, using write-ahead logging .

Write-ahead logging Disk

Write-ahead logging ➡ log_begin() Disk Log 0

Write-ahead logging 1. Append writes to the log ➡ log_begin() ➡ log_write(2, ‘a’) ➡ log_write(8, ‘b’) ➡ log_write(5, ‘c’) 2 8 5 Disk Log 0 a b c

Write-ahead logging 1. Append writes to the log ➡ log_begin() ➡ log_write(2, ‘a’) 2. Set commit record ➡ log_write(8, ‘b’) ➡ log_write(5, ‘c’) ➡ log_commit() 2 8 5 Disk Log 3 0 a b c

Write-ahead logging 1. Append writes to the log ➡ log_begin() ➡ log_write(2, ‘a’) 2. Set commit record ➡ log_write(8, ‘b’) 3. Apply the log to disk loca$ons ➡ log_write(5, ‘c’) ➡ log_commit() 2 8 5 Disk Log a c b 3 0 a b c

Write-ahead logging 1. Append writes to the log ➡ log_begin() ➡ log_write(2, ‘a’) 2. Set commit record ➡ log_write(8, ‘b’) 3. Apply the log to disk loca$ons ➡ log_write(5, ‘c’) 4. Truncate the log ➡ log_commit() Disk Log a c b 0 • Recovery : aRer crash, replay (apply) any commiNed transac$on in the log • Atomicity : either all writes appear on disk or none do • Durability : all changes are persisted on disk when log_commit() returns

Example: transac$onal crash safety … aYer crash … def create(dir, name): def log_recover(): log_begin() if committed: newfile = allocate_inode() log_apply() newfile.init() log_truncate() dir.add(name, newfile) log_commit() • Q: How to formally define what happens when the computer crashes? • Q: How to formally specify the behavior of “create” in presence of crash and recovery?

Approach: Crash Hoare Logic {pre} code {post} SPEC disk write ( a , v ) a 7! v 0 PRE a 7! v POST

Cer$fying a Crash-safe File System Nickolai Zeldovich - PowerPoint PPT Presentation

Cer$fying a Crash-safe File System Nickolai Zeldovich Collaborators: Tej Chajed, Haogang Chen, Alex Konradi, Stephanie Wang, Daniel Ziegler, Adam Chlipala, M. Frans Kaashoek File systems should not lose data People use file systems to

PUEBLO MS2 - CRASH http://pueblo.ms2soft.com/ By: Hannah Haunert TCDS Traffic Crash Location

Cool Cisco IOS Commands: test crash test crash test crash is an undocumented Cisco IOS command

File Management What is a file? Elements of file management File organization

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

FY16 HIGHLIGHTS Sales up 40% year on year (42% CER) UK up 38%, rest of Europe 25% (35% CER),

Arizona Crash Report Presentation by Glen Robison State Custodian of Crash Records Prepared

Crash Preventability Determination Program 1 Request and Review Process 2 Eligible Crash Types

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File System Implementation Summer 2016 Cornell University Today File allocation Unix

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting

FILE SYSTEM IMPLEMENTATION Sunu Wibirama Outline File-System Structure File-System

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Specifying and Checking File System Crash-Consistency Models James Bornholt Antoine Kaufmann

Tunable Static Inference for Generic Universe Types Werner Dietl Michael Ernst & Peter

Want Congress to Do Something? Start by Building Relationships Matthew David Hom Bend the Arc:

Saving Taxpaye Saving Taxpayers: s: How to How to Sav Save Taxp Taxpayers and and Bu Build

REP A tradition worth carrying forward.. http://www.nationalrep.org National Radiological

My Top 10 Mistakes On The Way From $0 to $100m ARR (Really, First $20m ARR) Dont Make Them.

Plan by State Representatives Mike Tobash (R Schuylkill/Berks) Mike Tobash (R Schuylkill/Berks)

Study of meson spectroscopy of a lattice SU(4) gauge BSM model. Venkitesh Ayyar 1 Thomas Degrand 1

Configuration Space Jane Li Assistant Professor Mechanical Engineering & Robotics

Cer$fying a Crash-safe File System Nickolai Zeldovich - PowerPoint PPT Presentation

Cer$fying a Crash-safe File System Nickolai Zeldovich Collaborators: Tej Chajed, Haogang Chen, Alex Konradi, Stephanie Wang, Daniel Ziegler, Adam Chlipala, M. Frans Kaashoek File systems should not lose data People use file systems to

PUEBLO MS2 - CRASH http://pueblo.ms2soft.com/ By: Hannah Haunert TCDS Traffic Crash Location

Cool Cisco IOS Commands: test crash test crash test crash is an undocumented Cisco IOS command

File Management What is a file? Elements of file management File organization

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

FY16 HIGHLIGHTS Sales up 40% year on year (42% CER) UK up 38%, rest of Europe 25% (35% CER),

Arizona Crash Report Presentation by Glen Robison State Custodian of Crash Records Prepared

Crash Preventability Determination Program 1 Request and Review Process 2 Eligible Crash Types

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File System Implementation Summer 2016 Cornell University Today File allocation Unix

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting

FILE SYSTEM IMPLEMENTATION Sunu Wibirama Outline File-System Structure File-System

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Specifying and Checking File System Crash-Consistency Models James Bornholt Antoine Kaufmann

Tunable Static Inference for Generic Universe Types Werner Dietl Michael Ernst &amp; Peter

Want Congress to Do Something? Start by Building Relationships Matthew David Hom Bend the Arc:

Saving Taxpaye Saving Taxpayers: s: How to How to Sav Save Taxp Taxpayers and and Bu Build

REP A tradition worth carrying forward.. http://www.nationalrep.org National Radiological

My Top 10 Mistakes On The Way From $0 to $100m ARR (Really, First $20m ARR) Dont Make Them.

Plan by State Representatives Mike Tobash (R Schuylkill/Berks) Mike Tobash (R Schuylkill/Berks)

Study of meson spectroscopy of a lattice SU(4) gauge BSM model. Venkitesh Ayyar 1 Thomas Degrand 1

Configuration Space Jane Li Assistant Professor Mechanical Engineering &amp; Robotics

Tunable Static Inference for Generic Universe Types Werner Dietl Michael Ernst & Peter

Configuration Space Jane Li Assistant Professor Mechanical Engineering & Robotics