Studying File Systems Uses BOB to study how properties varied over file systems Studied six file systems - ext2, ext3, ext4, btrfs, xfs, reiserfs - A total of 16 configurations CMU SDI Seminar 14 25
Study Results: Atomicity ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26
Study Results: Atomicity Single Sector ext2 async ext2 sync ext3 writeback ext3 ordered write(512) atomic? ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26
Study Results: Atomicity Single Sector ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26
Study Results: Atomicity Single Multi Sector Sector ext2 async ext2 sync ext3 writeback ext3 ordered write(1GB) atomic? ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26
Study Results: Atomicity Single Multi Sector Sector ext2 async x ext2 sync x ext3 writeback x ext3 ordered x ext3 journal x ext4 writeback x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26
Study Results: Atomicity Single Multi Append Sector Sector Content ext2 async x ext2 sync x ext3 writeback x ext3 ordered open(file, O_APPEND) x ext3 journal write(12K) atomic? x ext4 writeback x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26
Study Results: Atomicity Single Multi Append Sector Sector Content ext2 async x x ext2 sync x x ext3 writeback x x ext3 ordered x ext3 journal x ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26
Study Results: Atomicity Single Multi Append Directory Sector Sector Content Operation ext2 async x x ext2 sync x x ext3 writeback x x ext3 ordered x rename(old, new) atomic? ext3 journal x ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26
Study Results: Atomicity Single Multi Append Directory Sector Sector Content Operation ext2 async x x x ext2 sync x x x ext3 writeback x x ext3 ordered x ext3 journal x ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26
Study Results: Ordering ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 27
Study Results: Ordering Overwrite -> any op ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal write(4K) -> rename() ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 27
Study Results: Ordering Overwrite -> any op ext2 async x ext2 sync ext3 writeback x ext3 ordered x ext3 journal ext4 writeback x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs xfs default x xfs wsync x CMU SDI Seminar 14 27
Study Results: Ordering Overwrite Append -> any op -> any op ext2 async x x ext2 sync ext3 writeback x x ext3 ordered x ext3 journal ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x xfs default x x xfs wsync x CMU SDI Seminar 14 27
Study Results: Ordering Overwrite Append Dir op -> any op -> any op -> any op ext2 async x x x ext2 sync ext3 writeback x x ext3 ordered x ext3 journal ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x x xfs default x x xfs wsync x CMU SDI Seminar 14 27
Study Results: Ordering Overwrite Append Dir op Append(f) -> any op -> any op -> any op -> rename(f) ext2 async x x x x ext2 sync ext3 writeback x x x ext3 ordered x ext3 journal ext4 writeback x x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x x xfs default x x xfs wsync x CMU SDI Seminar 14 27
Study Results: Ordering Overwrite Append Dir op Append(f) -> any op -> any op -> any op -> rename(f) ext2 async x x x x ext2 sync ext3 writeback x x x ext3 ordered x ext3 journal ext4 writeback x x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x x xfs default x x xfs wsync x CMU SDI Seminar 14 27
File-System Study Results Persistence properties vary widely among file systems - Even within different configurations of same file system Applications should not rely on them Testing application correctness on single file system is not enough CMU SDI Seminar 14 28
Outline Introduction Background Analyzing file systems with BOB Analyzing applications with ALICE Application Study Conclusion and Future Work CMU SDI Seminar 14 29
Application-Level Intelligent Crash Explorer (ALICE) ALICE: tool to find Crash Vulnerabilities Application Crash Vulnerabilities - code that depends on specific persistence properties for correct behavior - ex: if file system doesn't persist two system calls calls in order, it leads to data corruption CMU SDI Seminar 14 30
ALICE Methodology Construct crash state by violating a single persistence property Run application on crash state (allow recovery) Examine application state If application inconsistent, it depended on persistence property violated in crash state CMU SDI Seminar 14 31
ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 32
ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 33
Tracing the Workload Run the application workload Collect the system-call traces System calls converted into logical operations: - Abstract away current file offset, fd, etc - Group writev(), pwrite() etc into a single type of operation CMU SDI Seminar 14 34
ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 35
Constructing Crash States ALICE constructs crash states by applying a subset of operations to the initial disk image creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Initial Crash Disk State State CMU SDI Seminar 14 36
Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) 2. Atomicity within system calls CMU SDI Seminar 14 37
Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) creat(index.lock) 2. Atomicity within system calls creat(tmp) append(tmp, 4K) Method: apply prefix fsync(tmp) link(tmp, perm) + partial operation CMU SDI Seminar 14 37
Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) creat(index.lock) 2. Atomicity within system calls creat(tmp) append(tmp, 512) Method: apply prefix … append(tmp, 512) + partial operation fsync(tmp) link(tmp, perm) CMU SDI Seminar 14 37
Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) creat(index.lock) 2. Atomicity within system calls creat(tmp) append(tmp, 512) Method: apply prefix … append(tmp, 512) + partial operation fsync(tmp) link(tmp, perm) CMU SDI Seminar 14 37
Constructing Crash States Persistence Properties Violated: 3. Ordering among system calls creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Method: ignore an operation, apply prefix CMU SDI Seminar 14 38
Constructing Crash States Persistence Properties Violated: 3. Ordering among system calls creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Method: ignore an operation, apply prefix CMU SDI Seminar 14 38
ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 39
FS Abstract Persistence Model Each file system implements persistence properties differently - Ex: ext4 orders writes of a file before its rename APM defines which crash states are permitted APM defines atomicity and ordering constraints APM allow ALICE to model file-system behavior without file-system implementation CMU SDI Seminar 14 40
ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 41
Finding Crash Vulnerabilities Identify persistence property violated creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Identify system calls involved Identify source code lines involved CMU SDI Seminar 14 42
ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 43
ALICE Limitations Not complete - does not execute all code paths in application - does not explore all crash states - does not test combinations of persistence property violations (ex: atomicity + ordering) Cannot prove an update protocol is correct CMU SDI Seminar 14 44
Outline Introduction Background Analyzing file systems with BOB Analyzing applications with ALICE Application Study Conclusion and Future Work CMU SDI Seminar 14 45
Application Study Used ALICE to study eleven applications Version Control Systems Key-Value Stores GDBM LMDB Relational Databases Distributed Systems ZooKeeper Virtualization Platforms Player CMU SDI Seminar 14
Study Goals Analyzed applications using weak APM - Minimum constraints on possible crash states Sought to answer: - Which persistence properties do applications depend upon? - What are the consequences of vulnerabilities? - How many vulnerabilities occur on today’s file systems? Did not seek to compare applications CMU SDI Seminar 14 47
Study: Setup What is correct behavior for an application? - We use guarantees in documentation - In case of no documentation, we assume typical user expectations (“committed data is durable”) Configurations change guarantees - We test each configuration separately - Tested 34 configurations across 11 applications Post-crash, we run all appropriate application recovery mechanisms CMU SDI Seminar 14 48
Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49
Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) [ append(logs/branch) ] Atomicity append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49
Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) Ordering fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49
Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) Ordering fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49
Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) Durability rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49
Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability Git Mercurial LevelDB-1.10 LevelDB-1.15 GDBM LMDB PostgreSQL HSQLDB SQLite HDFS ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50
Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 Git 2 Mercurial 1 LevelDB-1.10 1 LevelDB-1.15 1 GDBM LMDB PostgreSQL HSQLDB SQLite HDFS ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50
Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 Git 2 2 Mercurial 1 2 LevelDB-1.10 1 2 LevelDB-1.15 1 1 GDBM 1 LMDB 1 PostgreSQL 3 HSQLDB SQLite 1 HDFS 1 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50
Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 6 Git 2 2 5 Mercurial 1 2 6 LevelDB-1.10 1 2 3 LevelDB-1.15 1 1 1 GDBM 1 LMDB 1 PostgreSQL 3 4 HSQLDB SQLite 1 1 HDFS 1 1 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50
Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 6 1 Git 2 2 5 2 Mercurial 1 2 6 1 LevelDB-1.10 1 2 3 LevelDB-1.15 1 1 1 2 GDBM 1 LMDB 1 PostgreSQL 3 4 3 HSQLDB 1 SQLite 1 1 HDFS 1 1 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50
Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 6 1 Git 2 2 5 2 Mercurial 1 2 6 1 LevelDB-1.10 1 2 3 LevelDB-1.15 1 1 1 2 GDBM 1 LMDB 60 vulnerabilities across 11 applications 1 PostgreSQL 3 4 3 HSQLDB 1 SQLite 1 1 HDFS 1 1 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50
Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc Git Mercurial 1 LevelDB-1.10 2 LevelDB-1.15 GDBM LMDB PostgreSQL 2 HSQLDB SQLite HDFS ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51
Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 Git 2 Mercurial 1 1 LevelDB-1.10 2 LevelDB-1.15 2 GDBM LMDB PostgreSQL 2 3 HSQLDB 1 SQLite HDFS 2 ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51
Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 Git 2 1 Mercurial 1 1 5 LevelDB-1.10 2 2 LevelDB-1.15 2 3 GDBM LMDB 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51
Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 5 Git 2 1 6 Mercurial 1 1 5 4 LevelDB-1.10 2 2 2 LevelDB-1.15 2 3 GDBM LMDB 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51
Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 5 3 Git 2 1 6 5 Mercurial 1 1 5 4 LevelDB-1.10 2 2 2 LevelDB-1.15 2 3 GDBM 1 LMDB 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51
Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 5 3 Git 2 1 6 5 Mercurial 1 1 5 4 LevelDB-1.10 2 2 2 LevelDB-1.15 2 3 GDBM Many vulnerabilities result in data loss, 1 LMDB silent errors, and failed reads/writes 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51
Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 15 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52
Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 15 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52
Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 15 17 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52
Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 31 15 17 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52
Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities Every current file system exposes at least one vulnerability; 30 31 btrfs exposes more than half 15 17 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52
Observations Applications very careful in overwriting user data - None required atomicity for multi-block overwrites Applications not as careful in appending to logs - Multi-block appends require prefix atomicity - Ex: write(“ABC”) should result in “A”/“AB”/“ABC” Atomicity across system calls doesn't seem useful CMU SDI Seminar 14 53
Recommend
More recommend