Libnvmmio : Reconstructing SW IO Path with Failure-Atomic - PowerPoint PPT Presentation

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic Memory-Mapped Interface � � Jungsik Choi 1 , Jaewan Hong 2 , Youngjin Kwon 2 , Hwansoo Han 1 � � 1 2 USENIX ATC ‘20

SW Overhead Greater than Storage Latency ms SW Overhead HDD SSD Latency 𝜈 s TLC 3D NAND SSD XL-Flash SSD Optane SSD ns DCPMM PM NVDIMM-N PM Time 2

Reconstruct SW IO Path with Libnvmmio • Libnvmmio - Library Application … - Run on any POSIX FS (DAX-mmap) write fsync close open read - Transparent MMIO with logging Libnvmmio Logs Atomic Write - Make common IO path efficient Memory Mapped Files • Handle data ops at user-level open munmap MMIO a / mmap / close • Route metadata ops to kernel FS Kernel NVM-aware FS - Low-latency & scalable IO - Data-atomicity NVMM Files 3

User-Level IO is Suitable in NVMM system • Kernel’s IO stacks introduce SW overhead • User-level IO with mmap Application - Access files directly with load / store read / write load / store - Reduce user/kernel mode switches VFS - Avoid complex IO stacks File System - No indexing, no permission checks Device Driver OS Kernel • MMIO is the fastest way to access files NVMM 4

Logging is more Efficeint than CoW • CoW (or shadow paging) - High write amplification - Hugepages make CoW more expensive - Frequent TLB-shutdown • Logging (or journaling) - Writing data twice: logs and files - Differential logging - Checkpointing can be postponed 5

Redo vs. Undo • Most logging systems use only one policy (redo or undo) • They have different pros & cons depending on access type - REDO is better for writing, UNDO is better for reading App App UNDO REDO File File Log Log Write Async Write Read 6

Hybrid Logging • Uses adaptive policy depending on the access type of a file - Read-intensive file à Undo logging - Write-intensive file à Redo logging • Maintains per-file read/write counters • Determines logging policy on each fsync • Achieves the best case performance of two logging policies - Reduce SW overhead and improve logging efficiency 7

Centralized Logging with Fine-Grained Locks • Decentralized logging was designed for transactions - e.g. , per-thread logging, per-transaction logging • Centralized logging is appropriate for file IO, but not scalable - Requires fine-grained locks for scalable file IO Log Log Log Log File File Centralized Logging Decentralized Logging 8

Per-Block Logging Multi-Level Radix Tree Tree Per-Block Log File 9

Lock-Free Radix Tree 9 9 9 9 12 File Offset Global Upper Middle Table Offset Table Index LMD Log Entry Entry LUD (4KB) entry LGD size rwlock offset len Delta dest lgd policy skip epoch radix_root Per-Block Log 10

Commit & Checkpoint based on Epoch • Per-block logs are atomically committed on fsync • Libnvmmio commits by increasing the global epoch value - Committed logs have an epoch smaller than the global epoch • Background ckeckpointing Memory Mapped File 2 Radix Tree 2 1 Per-File Metadata 2 Per-Block Background Logs Threads 11

Design Summary Libnvmmio provides low-latency and scalable IO while guaranteeing data-atomicity • Low-latency IO • Scalable IO −User-level IO with mmap −Per-block logging −Differential logging −Lock-free index data structure −Hybrid logging −Various log sizes −Epoch-based committing −Background checkpointing 12

Experimental Setup • Experimental Machines - 32GB NVDIMM-N , 20 cores and 32GB DRAM - 256GB Optane DC , 16 cores and 128GB DRAM (in our paper) • Comparison systems Filesystem File IO Data-Atomicity Kernel Ext4-DAX Kernel X 5.1 PMFS Kernel X 4.13 NOVA Kernel O 5.1 SplitFS User O 4.13 Libnvmmio * User O 5.1 13

Hybrid Logging 20 (lapsHd 7imH (sHc) 8ndR 5HdR 15 HyEUid 10 5 0 0:100 10:90 20:80 30:70 40:60 50:50 60:40 70:30 80:20 90:10 100:0 5:: 5aWiR 14

FIO: Different Access Patterns • A single thread, file size=4GB, block size=4KB, time=60s 8 BDnGwLGWh (GLB/V) (xW4-DAX P0)6 6 N2VA /LEnvPPLR 4 2 0 6R RR 6W RW AFFeVV PDWWern 15

FIO: Different Write Sizes (xW4-DAX 30)6 12VA /LEnvPPLo 1.5 BDnGwLGWK (GLB/V) 4 1.0 3 2 0.5 1 0.0 0 128B 1.B 4.B 64.B 10B WrLWe 6Lze 16

FIO: Random Write with Multithreads PrLvDte fLOe 6hDreG fLOe (xt4-DAX 25 BDnGwLGth (GLB/V) 25 P0)6 20 12VA 20 /LEnvPPLo 15 15 10 10 5 5 1 2 4 8 16 1 2 4 8 16 # ThreDGV # ThreDGV 17

TPC-C on SQLite • Underlying FS with WAL, and Libnvmmio without WAL OnOy XnderOyLng FS LLEnvPPLo on FS 1.5 1orPDOLzed tSPC 1.0 0.5 0.0 Ext4-DAX P0FS 1OVA SSOLtFS 18

SQLite WAL vs. Libnvmmio • SQLite WAL • Libnvmmio −Design for block devices −Design for NVMM −Similar to REDO logging −Hybrid Logging −Read both WAL and DB file −Read DB file (UNDO) −Only one writer at a time −Concurrent writes −Synchronous checkpointing −Background checkpointing • Easily improve performance with Libnvmmio −Support any FS, Even FS that does not provide data-atomicity 19

Conclusion • It is important to minimize SW overhead in NVMM systems • Libnvmmio is a simple and practical solution - Reconstruct SW IO path - Run on any filesystem that provide DAX-mmap • Low-latency, scalable IO while guaranteeing data-atomicity - 2.2x better throughput - 13x better scalability • https://github.com/chjs/libnvmmio 20

QnA chjs@skku.edu

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic - PowerPoint PPT Presentation

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic Memory-Mapped Interface Jungsik Choi 1 , Jaewan Hong 2 , Youngjin Kwon 2 , Hwansoo Han 1 1 2 USENIX ATC 20 SW Overhead Greater than Storage Latency ms SW Overhead

FOUND IN TRANSLATION: Reconstructing Phylogenetic Language Trees Reconstructing Phylogenetic

Reconstructing Sakhalin Taimen ( Hucho perryi Hucho perryi ) ) Reconstructing Sakhalin Taimen (

Reconstructing the Scene of the Crime Reconstructing the Scene of the Crime Who are they? STEVE

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

A * A path finding algorithm. A path finding algorithm. Given a state space, such as a

On Path Generation, Path Following On Path Generation, Path Following and Time Coordination for

Using Off-Path and On-Path Signaling for Internet Security Saikat Guha, Paul Francis Cornell

Making the Computer Personal: Making the Computer Personal: Reconstructing Domesticity for the

PALLIATIVE CARE Advanced heart failure Heart failure has a poor prognosis Heart failure

Management of Co- morbidities in Heart Failure (COPD, Renal failure, Anemia) Dr John Parissis,

Introduction to Path Analysis Ways to think about path analysis Path coefficients

Martha Brumfield, President and CEO C-Path Mission C-Path The Critical Path Institute is a

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

Locking Don Porter Portions courtesy Emmett Witchel 1 COMP 530: Operating Systems Too Much

Transactions and Concurrency Control Kroenke, Chapter 9, pg 321-335 PHP & MySQL Web

Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1

CH NG 8: FAULT TOLERANCE TS. Tr n H i Anh Content 2 1. Introduction to fault

Causal Atomicity: Correctness conditions for weak memory Heike Wehrheim Joint work with Simon

OF.CPP: Consistent Packet Processing Peter Pereni @ EPFL Maciej Kuniar, Nedeljko Vasi @

Comp115: Databases Crash Recovery Instructor: Manos Athanassoulis Comp115 [Spring 2017] -

Transactions 09 Transactions Alexandros Labrinidis University of Pittsburgh 2 Alexandros

Sambuz

Useful Links

Newsletter

Mail Us

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic - PowerPoint PPT Presentation

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic Memory-Mapped Interface Jungsik Choi 1 , Jaewan Hong 2 , Youngjin Kwon 2 , Hwansoo Han 1 1 2 USENIX ATC 20 SW Overhead Greater than Storage Latency ms SW Overhead

FOUND IN TRANSLATION: Reconstructing Phylogenetic Language Trees Reconstructing Phylogenetic

Reconstructing Sakhalin Taimen ( Hucho perryi Hucho perryi ) ) Reconstructing Sakhalin Taimen (

Reconstructing the Scene of the Crime Reconstructing the Scene of the Crime Who are they? STEVE

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

A * A path finding algorithm. A path finding algorithm. Given a state space, such as a

On Path Generation, Path Following On Path Generation, Path Following and Time Coordination for

Using Off-Path and On-Path Signaling for Internet Security Saikat Guha, Paul Francis Cornell

Making the Computer Personal: Making the Computer Personal: Reconstructing Domesticity for the

PALLIATIVE CARE Advanced heart failure Heart failure has a poor prognosis Heart failure

Management of Co- morbidities in Heart Failure (COPD, Renal failure, Anemia) Dr John Parissis,

Introduction to Path Analysis Ways to think about path analysis Path coefficients

Martha Brumfield, President and CEO C-Path Mission C-Path The Critical Path Institute is a

More On Paths Supplement to Chapter 4, Graph Theory Path definition What is a path? We

Locking Don Porter Portions courtesy Emmett Witchel 1 COMP 530: Operating Systems Too Much

Transactions and Concurrency Control Kroenke, Chapter 9, pg 321-335 PHP &amp; MySQL Web

Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1

CH NG 8: FAULT TOLERANCE TS. Tr n H i Anh Content 2 1. Introduction to fault

Causal Atomicity: Correctness conditions for weak memory Heike Wehrheim Joint work with Simon

OF.CPP: Consistent Packet Processing Peter Pereni @ EPFL Maciej Kuniar, Nedeljko Vasi @

Comp115: Databases Crash Recovery Instructor: Manos Athanassoulis Comp115 [Spring 2017] -

Transactions 09 Transactions Alexandros Labrinidis University of Pittsburgh 2 Alexandros

Sambuz

Useful Links

Newsletter

Mail Us

Transactions and Concurrency Control Kroenke, Chapter 9, pg 321-335 PHP & MySQL Web