Enabling System Transactions via Lightweight Kernel Extensions R.P. Spillane, S. Gaikwad. M. Chinni, C.P. Wright, E. Zadok Stony Brook University http://www.fsl.cs.sunysb.edu/
Summary What is the design complexity of system transactions implemented in the VFS? Low 100 lines of code added to page writeback 4000 lines of module code (log implementation) What is the performance? Valor: 35% overhead on top of theoretical best, compared to… 104% overhead for an efficient user-level alternative 2/28/2009 FAST 2009 - Enabling System Transactions 2
System Transaction Process 1 FS State: foo ’ FS State: foo System Calls TID←sys_tbegin (...) / / write(TID,...) unlink(TID,...) sys_tabort(TID) f1 f2 f1 f2 2/28/2009 FAST 2009 - Enabling System Transactions 3
The Design Spectrum Valor side-steps the traditional trade-off by working with the Kernel’s page cache in a general way. Quicksilver, TxF Valor Transparency & Amino Performance Berkeley DB, KBDB Stasis Design Feasibility 2/28/2009 FAST 2009 - Enabling System Transactions 4
Valor’s Process Txn Model Transactional Model Supported Operations: dirtying a page appending to a file, modifying an inode modifying a directory Locking: directory locks, inode locks page range locks for overwrites intent locks for directory renames 2/28/2009 FAST 2009 - Enabling System Transactions 5
Asynchronous By Default ACI (no D w/o tsync) Similar to asynchronous write(2) with fsync(2) Same purpose (performance increase) Requires page cache for files updated transactionally 2/28/2009 FAST 2009 - Enabling System Transactions 6
Valor Design Modify page writeback to support simple write ordering Implement an ARIES style undo/redo log module for FS-operations 2/28/2009 FAST 2009 - Enabling System Transactions 7
Page Dirtying: No Txns LEGEND: OK bad Old Page Process 1 New Page Uh- oh… write(TID,… ) write(TID,… ) 2/28/2009 FAST 2009 - Enabling System Transactions 8
Page Dirtying: With Txns LEGEND: Old Page Process 1 U/R Page New Page log_append (TID,… ) log_append (TID,… ) write(TID,… ) write(TID,… ) 2/28/2009 FAST 2009 - Enabling System Transactions 9
Current Kernel Design LEGEND: Page Cache Old Page U/R Page Ext3 New Page Ext2 Uh- oh… XFS ZFS Page Writeback … Process 2 log_append (TID,…) write(TID,…) 2/28/2009 FAST 2009 - Enabling System Transactions 10
What DBs Do Page Cache II: The Wrath of Khan Disk Cache Flush (fsync) Page Cache Ext2 XFS ZFS 2/28/2009 FAST 2009 - Enabling System Transactions 11
Simple Write Ordering LEGEND: Page Cache Old Page FS1 U/R Page FS2 New Page FS3 FS4 Valor 2/28/2009 FAST 2009 - Enabling System Transactions 12
Log Module State File Log File Process 2 U/R Page 1 Disk tbegin (TID,…) 1 1 U/R,1 U/R,1 U/R,1 tlog (TID,…) 2 1 3 2 U/R Page 2 Valor Module write(TID,…) 3 2 U/R,1 U/R,1 C,1 page writeback 4 U/R Page 3 4 5 6 tlog (TID,…) 5 3 Record Maps write(TID,…) 6 U/R Page 4 U/R,1 U/R,1 U/R,1 tresolve (TID,…) 7 4 1 3 2 8 page writeback U/R Page 5 page writeback 9 5 U/R,1 U/R,1 C,1 4 5 6 6 2/28/2009 FAST 2009 - Enabling System Transactions 13
Atomicity Argument Transition from pre-writeback to post- writeback disk state atomically iff All writes preceded by sys_log_append Simple write ordering is implemented writes to a single sector are atomic Valor satisfies the top 2 constraints A supported hard disk satisfies the third 2/28/2009 FAST 2009 - Enabling System Transactions 14
Performing Recovery Two kinds of recovery are supported: System Recovery Application Recovery (per-process abort) Standard recovery process: Reconstruct RAM state from log In reverse LSN order commit/abort landed transactions Perform a page writeback 2/28/2009 FAST 2009 - Enabling System Transactions 15
Evaluation We must compare against traditional asynchronous FSes benchmark against asynchronous ext3 do serial transfer benchmarks for large files We turn off synchronous transactions for two other controls (for fairness) FS built on top of Stasis FS built on top of Berkeley DB 2/28/2009 FAST 2009 - Enabling System Transactions 16
Mock ARIES Benchmark Important lower bound (not tight) MT-ow-noread MT-ow MT-ow-finite Disk Disk Disk Log Log Log 2/28/2009 FAST 2009 - Enabling System Transactions 17
Mock ARIES Benchmark 104% 90 66% 80 Elapsed Time (sec) 70 35% 60 16% 50 2% 2x 40 Wait 30 User 20 System 10 0 2/28/2009 FAST 2009 - Enabling System Transactions 18
Serial Overwrite Transaction size: 16 pages 1000 900 22.75 x Ext3 Elapsed Time (sec) 800 700 600 BDB 500 Stasis 400 Valor 300 200 Ext3 100 5.0 x Ext3 0 256 512 1024 2048 2.75 x Ext3 Size of Serial Transfer (MiB) 2/28/2009 FAST 2009 - Enabling System Transactions 19
Transaction Throughput 1200 Valor Heel 1000 Elapsed Time (sec) 23.0 x Ext3 800 BDB 600 BDB Heel Stasis Valor 400 Stasis Heel Ext3 200 4.2 x Ext3 0 2.9 x Ext3 1 4 16 64 256 Size of Transaction (pages) 2/28/2009 FAST 2009 - Enabling System Transactions 20
Conclusions System transactions are feasible Valor achieves good overhead Minimal changes to existing kernels 2/28/2009 FAST 2009 - Enabling System Transactions 21
Limitations/Future Work Limitations Locking slows interleaved writes to the same page Some FSes/Disks do not fsync() when asked to Future Work Explore use of logging device as a coordinator in a transactional disk array 2/28/2009 FAST 2009 - Enabling System Transactions 22
Q&A Enabling System Transactions via Lightweight Kernel Extensions R.P. Spillane, S. Gaikwad. M. Chinni, C.P. Wright, E. Zadok Stony Brook University http://www.fsl.cs.sunysb.edu/
TxF TxF is Microsoft’s transactional file system Motivation: program installation, system updates, website updates Pros Backed by Microsoft Cons Specific to NTFS 2/28/2009 FAST 2009 - Enabling System Transactions 24
Isolation Extended mandatory locking Allows locking of directories Do not have to set group exec/setgid bits Locking permissions Let users decide if a file can be locked All processes acquire locks Regular processes hold only for the syscall Lock inheritance Allow multi-process transactions 2/28/2009 FAST 2009 - Enabling System Transactions 25
Valor != Journaling Journaling FSes good at fast recovery …but are too special-purpose: No-Steal Caching all state modified by a txn. must remain in memory until commit/abort Non-Modular Design does not handle rollback of VFS and page caches, just disk-state on boot 2/28/2009 FAST 2009 - Enabling System Transactions 26
Recommend
More recommend