Verifying filesystems in ACL2 Towards verifying file recovery tools - PowerPoint PPT Presentation

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of Computer Science University of Texas at Austin mihir@cs.utexas.edu 10 November, 2017 1/34

Outline Motivation and related work Our approach Progress so far Future work 2/34

Why we need a verified filesystem ◮ Filesystems are everywhere, even as operating systems move towards making them invisible. ◮ In the absence of a clear specification of filesystems, users (and sysadmins in particular) are underserved. ◮ Modern filesystems have become increasingly complex, and so have the tools to analyse and recover data from them. ◮ It would be worthwhile to specify and formally verify, in the ACL2 theorem prover, the guarantees claimed by filesystems and tools. 3/34

Related work ◮ In Haogang Chen’s 2016 dissertation, the author uses Coq to build a filesystem (named FSCQ) which is proven safe against crashes in a new logical framework named Crash Hoare Logic. ◮ His implementation was exported into Haskell, and showed comparable performance to ext4 when run on FUSE. ◮ Hyperkernel (Nelson et al, SOSP ’17) is a ”push-button” verification effort, but approximates by changing POSIX system calls for ease of verification. ◮ In our work, we instead aim to model an existing filesystem (FAT32) faithfully and match the resulting disk image byte-to-byte. 4/34

Choosing an initial model ◮ Our goal here is to verify the FAT32 filesystem, but we need a simpler model to begin with. ◮ Our filesystem’s operations should suffice for running a workload. ◮ Yet, parsimony and avoidance of redundancy are essential for theorem proving. ◮ What’s a necessary and sufficient set of operations? 6/34

Minimal set of operations? ◮ The Google filesystem suggests a minimal set of operations: ◮ create ◮ delete ◮ open ◮ close ◮ read ◮ write ◮ Of these, open and close require the maintenance of file descriptor state - so they can wait. ◮ However, they are essential when describing concurrency and multiprogramming behaviour. ◮ Thus, we can start modelling a filesystem, and several refinements thereof. 7/34

Quick overview of models ◮ Model 1: Tree representation of directory structure with unbounded file size and unbounded filesystem size. ◮ Model 2: Model 1 with file length as metadata. ◮ Model 3: Tree representation of directory structure with file contents stored in a ”disk”. ◮ Model 4: Model 3 with bounded filesystem size and garbage collection. 8/34

Model 1 \ vmlinuz,” \ 0 \ 0 \ 0” tmp ticket1,”Sun 19:00” 9/34

Model 1 \ vmlinuz,” \ 0 \ 0 \ 0” tmp ticket1,”Sun 19:00” ticket2,”Tue 21:00” 10/34

Model 1 \ vmlinuz,” \ 0 \ 0 \ 0” tmp ticket2,”Tue 21:00” 11/34

Model 1 \ vmlinuz,” \ 0 \ 0 \ 0” tmp ticket2,”Wed 01:00” 12/34

Model 2 ◮ Model 1 supports nested directory structures, unbounded file size and unbounded filesystem size. ◮ However, there’s no metadata, either to provide additional information or to validate the contents of the file. ◮ With an extra field for length, we can create a simple version of fsck that checks file contents for consistency. ◮ Further, we can verify that create, write, delete etc preserve this notion of consistency. 13/34

Model 2 \ vmlinuz,” \ 0 \ 0 \ 0”,3 tmp ticket1,”Sun 19:00”,9 14/34

Model 2 \ vmlinuz,” \ 0 \ 0 \ 0”,3 tmp ticket1,”Sun 19:00”,9 ticket2,”Tue 21:00”,9 15/34

Model 2 \ vmlinuz,” \ 0 \ 0 \ 0”,3 tmp ticket2,”Tue 21:00”,9 16/34

Model 2 \ vmlinuz,” \ 0 \ 0 \ 0”,3 tmp ticket2,”Wed 01:00”,9 17/34

Model 3 ◮ As the next step, we focus on externalising the storage of file contents. ◮ We also choose to break up file contents into ”blocks” of a constant length (8.) ◮ Note: this would mean storing file length is no longer optional, to avoid reading garbage past end of file at the end of a block. 18/34

Model 3 \ tmp vmlinuz,(0),3 ticket1,(1 2),9 Table: Disk \ 0 \ 0 \ 0 Sun 19:0 0 19/34

Model 3 \ tmp vmlinuz,(0),3 ticket1,(1 2),9 ticket2,(3 4),9 Table: Disk \ 0 \ 0 \ 0 Sun 19:0 0 Tue 21:0 0 20/34

Model 3 \ tmp vmlinuz,(0),3 ticket2,(3 4),9 Table: Disk \ 0 \ 0 \ 0 Sun 19:0 0 Tue 21:0 0 21/34

Model 3 \ tmp vmlinuz,(0),3 ticket2,(5 6),9 Table: Disk \ 0 \ 0 \ 0 Sun 19:0 0 Tue 21:0 0 Wed 01:0 0 22/34

Model 4 ◮ In the fourth model, we attempt to implement garbage collection in the form of an allocation vector. ◮ The allocation vector tracks whether blocks in the filesystem are in use by a file. This allows us to reuse unused blocks. 23/34

Model 4 \ vmlinuz,(0),3 tmp ticket1,(1 2),9 Table: Disk \ 0 \ 0 \ 0 true Sun 19:0 true 0 true false false false 24/34

Model 4 \ vmlinuz,(0),3 tmp ticket1,(1 2),9 ticket2,(3 4),9 Table: Disk \ 0 \ 0 \ 0 true Sun 19:0 true 0 true Tue 21:0 true 0 true false 25/34

Model 4 \ vmlinuz,(0),3 tmp ticket2,(3 4),9 Table: Disk \ 0 \ 0 \ 0 true Sun 19:0 false 0 false Tue 21:0 true 0 true false 26/34

Model 4 \ vmlinuz,(0),3 tmp ticket2,(1 2),9 Table: Disk \ 0 \ 0 \ 0 true Wed 01:0 true 0 true Tue 21:0 false 0 false false 27/34

Proof approaches and techniques ◮ There are many properties that could be considered for correctness, but we choose to focus on the read-over-write theorems from the first-order theory of arrays. ◮ Read n characters starting at position start in the file at path hns in filesystem fs : l1-rdchs(hns, fs, start, n) ◮ Write string text characters starting at position start in the file at path hns in filesystem fs : l1-wrchs(hns, fs, start, text) 29/34

Proof approaches and techniques ◮ First read-over-write theorem: reading from a location after writing to the same location should yield the data that was written. Formally, assuming n = length(text) and suitable ”type” hypotheses (omitted here): l1-rdchs(hns, l1-wrchs(hns, fs, start, text), start, n) = text ◮ Second read-over-write-theorem: Reading from a location after writing to a different location should yield the same result as reading before writing. Formally, assuming hns1 != hns2 and suitable ”type” hypotheses (omitted here): l1-rdchs(hns1, l1-wrchs(hns2, fs, start2, text2), start1, n1) = l1-rdchs(hns1, fs, start1, n1) 30/34

Proof approaches and techniques ◮ For each of the models 1, 2, 3 and 4, we have proofs of correctness of the two read-after-write properties, making use of the proofs of equivalence between models and their successors. ◮ Model 4 presented some unique challenges - proving the read-after-write properties required proving an equivalence between model 4 and model 2, rather than model 3. 31/34

Proof approaches and techniques l 2 l 2 write l2-to-l1-fs l2-to-l1-fs l 1 l 1 write l 2 text read l2-to-l1-fs read l 1 l 2 l 2 text write read l2-to-l1-fs l2-to-l1-fs read l 1 l 1 write 32/34

Future work ◮ Model and verify file permissions. ◮ Linearise the tree, leaving only the disk. ◮ Add the system call open and close with the introduction of file descriptors. This would be a step towards the study of concurrent FS operations. ◮ Eventually emulate the FAT32 filesystem as a convincing proof of concept, and move on to fsck and file recovery tools. 34/34

Verifying filesystems in ACL2 Towards verifying file recovery tools - PowerPoint PPT Presentation

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of Computer Science University of Texas at Austin mihir@cs.utexas.edu 10 November, 2017 1/34 Outline Motivation and related work Our approach

A Simple Java Code Generator for ACL2 Based on a Deep Embedding of ACL2 in Java Alessandro

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

ACL2(ml): Machine-Learning for ACL2 J. Heras and E. Komendantskaya

Axiomatic Events in ACL2 ( r ) Ruben Gamboa, John Cowles, and Nadya Kuzmina University of Wyoming

Adding a typing Adding a typing mechanism to ACL2 mechanism to ACL2 Vernon Austel Vernon

Challenge Problems for Challenge Problems for the ACL2 Community the ACL2 Community David

Flat Domains and Recursive Equations in ACL2 by John Cowles University of Wyoming 1 ACL2 is a

A SAT-Based Procedure for Verifying Finite State Machines in ACL2 Warren A. Hunt, Jr. and Erik

Double Rewriting for Equivalential Reasoning in ACL2 Matt Kaufmann and J Strother Moore ACL2

How Can I Do That with ACL2? Recent Enhancements to ACL2 Matt Kaufmann and J Strother Moore 1

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

This time we'll talk about filesystems. We'll start out by looking at disk partitions, which are

Introduction Introduction to storage and to storage and filesystems filesystems Introduction

Hard State Revisited: Network Filesystems Hard State Revisited: Network Filesystems Jeff Chase

Formalising Filesystems in the ACL2 Theorem Prover An Application To FAT32 Mihir Mehta

Embedding ACL2 in HOL Mike Gordon, Warren A. Hunt, Jr., Matt Kaufmann, James Reynolds Gordon,

June 13, 2018 Disclaimer - Estimates are considered accurate based on data as of the date of

Pima County Transportation Advisory Committee Regional Local Road Repair Program September 12,

File System Implementation Sunu Wibirama Thursday, December 16, 2010 Outline File-System

The Magic Stick Q-Stick is a small, stylish and powerful gateway which connects to an existing

FINDING A NEEDLE IN HAYSTACK, FACEBOOKS PHOTO STORAGE Based on: D. Beaver, S. Kumar, H. C. Li,

Board Self-Evaluation August 2, 2017 Grace Mah CSBA Board Self-Evaluation Objective A

Screenshot Presentation Learning Management System Description (pt-br) Ivela uma aplicao

United States Court of Appeals for the Federal Circuit 05-1074, -1075, -1100 ON DEMAND MACHINE