Optimizing Memory-mapped I/O for Fast Storage Devices Anastasios - PowerPoint PPT Presentation

Optimizing Memory-mapped I/O for Fast Storage Devices Anastasios Papagiannis 1,2 , Giorgos Xanthakis 1,2 , Giorgos Saloustros 1 , Manolis Marazakis 1 , and Angelos Bilas 1,2 Foundation for Research and Technology – Hellas (FORTH) 1 & University of Crete 2 USENIX ATC 2020 1

Fast storage devices • Fast storage devices à Flash, NVMe • Millions of IOPS • < 10 μs access latency • Small I/Os are not such a big issue as in rotational disks • Require many outstanding I/Os for peak throughput USENIX ATC 2020 2

Read/write system calls User Space • Read/write system calls + DRAM cache • Reduce accesses to the device Cache • Kernel-space cache • Requires system calls also for hits • Used for raw (serialized) blocks Kernel Space • User-space cache • Lookups for hits + system calls only for misses Cache • Application specific (deserialized) data • User-space cache removes system calls for hits • Hit lookups in user space introduce significant overhead [SIGMOD’08] Device USENIX ATC 2020 3

Memory-mapped I/O • In memory-mapped I/O (mmio) hits handled in hardware à MMU + TLB • Less overhead compared to cache lookup • In mmio a file mapped to virtual address space • Load/store processor instructions to access data • Kernel fetch/evict page on-demand • Additionally mmio removes • Serialization/deserialization • Memory copies between user and kernel USENIX ATC 2020 4

Disadvantages of mmio • Misses require a page fault instead of a system call • 4KB page size à Small & random I/Os • With fast storage devices this is not a big issue • Linux mmio path fails to scale with #threads USENIX ATC 2020 5

Mmio path scalability Device: null_blk 5 4.5 Dataset: 4TB Million page-faults/sec (IOPS) 4 DRAM cache: 192GB 3.5 3 2.5 2 1.5 1 0.5 0 1 2 4 8 16 32 Linux-Read Linux-Write USENIX ATC 2020 6

Mmio path scalability Device: null_blk 5 4.5 Dataset: 4TB Million page-faults/sec (IOPS) 4 DRAM cache: 192GB 3.5 Queue depth ≈ 27 3 2M IOPS 2.5 1.3M IOPS 2 1.5 1 0.5 0 1 2 4 8 16 32 Linux-Read (4.14) Linux-Write (4.14) Linux-Read (5.4) Linux-Write (5.4) USENIX ATC 2020 7

FastMap • A novel mmio path that achieves high scalability and I/O concurrency • In the Linux kernel • Avoids all centralized contention points • Reduces CPU processing in the common path • Uses dedicated data structures to minimize interference USENIX ATC 2020 8

Mmio path scalability Device: null_blk 5 4.5 Dataset: 4TB Million page-faults/sec (IOPS) 4 DRAM cache: 192GB 3.5 3x in 3 reads 2.5 2 6x in 1.5 writes 1 0.5 0 1 2 4 8 16 32 Linux-Read (4.14) Linux-Write (4.14) Linux-Read (5.4) Linux-Write (5.4) FastMap-Read FastMap-Write USENIX ATC 2020 9

Outline • Introduction • Motivation • FastMap design • Experimental analysis • Conclusions USENIX ATC 2020 10

FastMap design: 3 main techniques • Separates data structures that keep clean and dirty pages • Avoids all centralized contention points • Optimizes reverse mappings • Reduces CPU processing in the common path • Uses a scalable DRAM cache • Minimizes interference and reduce latency variability USENIX ATC 2020 12

Linux mmio design page_tree page tree_lock address_space page VMA 126x contented lock acquisitions 155x more page wait time • tree_lock acquired for 2 main reasons • Insert/remove elements from page_tree & lock-free (RCU) lookups • Modify tags for a specific entry à Used to mark a page dirty USENIX ATC 2020 14

FastMap design . . . page_tree page_tree page_tree 0 1 N-1 VMA PFD . . . dirty_tree dirty_tree dirty_tree 0 1 N-1 • Keep dirty pages on a separate data structure • Marking a page dirty/clean does not serialize insert/remove ops • Choose data-structure based on page_offset % num_cpus • Radix trees to keep ALL cached pages à lock-free (RCU) lookups • Red-black trees to keep ONLY dirty pages à sorted by device offset USENIX ATC 2020 15

Reverse mappings • Find out which page table entries map a specific page • Page eviction à Due to memory pressure or explicit writeback • Destroy mappings à munmap • Linux uses object-based reverse mappings • Executables and libraries (e.g. libc) introduce large amount of sharing • Reduces DRAM consumption and housekeeping costs • Storage applications that use memory-mapped I/O • Require minimal sharing • Can be applied selectively to certain devices or files USENIX ATC 2020 17

Linux object-based reverse mappings page vma PGD _mapcount address_space i_mmap vma PGD read/write page semaphore vma PGD _mapcount • _mapcount can still results in useless page table traversals • rw-semaphore acquired as read on all operations • Cross NUMA-node traffic • Spend many CPU cycles USENIX ATC 2020 18

FastMap full reverse mappings • Full reverse mappings VMA, vaddr page VMA, vaddr • Reduce CPU overhead • Efficient munmap VMA, vaddr page • No ordering required è scalable updates • More DRAM required per-core • Limited by small degree of sharing in pages VMA USENIX ATC 2020 19

Batched TLB invalidations • Under memory pressure FastMap evicts a batch of clean pages • Cache related operations • Page table cleanup • TLB invalidation • TLB invalidation require an IPI (Inter-Processor Interrupt) • Limits scalability [EuroSys’13, USENIX ATC’17, EurorSys’20] • Single TLB invalidation for the whole batch • Convert batch to range including unnecessary invalidations USENIX ATC 2020 21

Other optimizations in the paper • DRAM cache • Eviction/writeback operations • Implementation details USENIX ATC 2020 22

Testbed • 2x Intel Xeon CPU E5-2630 v3 CPUs (2.4GHz) • 32 hyper-threads • Different devices • Intel Optane SSD DC P4800X (375GB) in workloads • null_blk in microbenchmarks • 256 GB of DDR4 DRAM • CentOS v7.3 with Linux 4.14.72 USENIX ATC 2020 24

Workloads • Microbenchmarks • Storage applications • Kreon [ACM SoCC’18] – persistent key-value store (YCSB) • MonetDB – column oriented DBMS (TPC-H) • Extend available DRAM over fast storage devices • Silo [SOSP’13] – key-value store with scalable transactions (TPC-C) • Ligra [PPoPP’13] – graph algorithms (BFS) USENIX ATC 2020 25

FastMap Scalability 4x Intel Xeon CPU E5-4610 v3 CPUs (1.7 GHz) 80 hyper-threads 8 million page-faults/sec (IOPS) FastMap-Rd-SPF 7 FastMap-Wr-SPF FastMap-Rd 6 FastMap-Wr 11.8x mmap-Rd 5 37.4% mmap-Wr 32% 25.4% 4 7.6% 3 2 1 0 1 10 20 40 80 #threads USENIX ATC 2020 26

FastMap execution time breakdown 600 500 #samples (x1000) 400 mark_dirty address-space 300 page-fault other 200 100 0 mmap-Read mmap-Write FastMap-Read FastMap-Write USENIX ATC 2020 27

Kreon key-value store • Persistent key-value store based on LSM-tree • Designed to use memory-mapped I/O in the common path • YCSB with 80M records • 80GB dataset • 16GB DRAM USENIX ATC 2020 28

Kreon – 100% inserts FastMap mmap 400 idle iowait 350 kworker 300 pgfault 3.2x pthread time (sec) 250 others ycsb 200 kreon 150 100 50 0 1 2 4 8 16 32 1 2 4 8 16 32 #cores USENIX ATC 2020 29

Kreon – 100% lookups FastMap mmap idle 400 iowait 350 kworker pgfault 300 others time (sec) ycsb 250 kreon 200 150 1.5x 100 50 0 1 2 4 8 16 32 1 2 4 8 16 32 #cores USENIX ATC 2020 30

Batched TLB invalidations • TLB batching results in 25.5% more TLB misses • Improvement due to fewer IPIs Silo key-value store • 24% higher throughput & • 23.8% lower average latency TPC-C • Less time in flush_tlb_mm_range() • 20.3% à 0.1% USENIX ATC 2020 31

Conclusions • FastMap, an optimized mmio path in Linux • Scalable with number of threads & low CPU overhead • FastMap has significant benefits for data-intensive applications • Fast storage devices • Multi-core servers • Up to 11.8x more IOPS with 80 cores and null_blk • Up to 5.2x more IOPS with 32 cores and Intel Optane SSD USENIX ATC 2020 32

Optimizing Memory-mapped I/O for Fast Storage Devices Anastasios - PowerPoint PPT Presentation

Optimizing Memory-mapped I/O for Fast Storage Devices Anastasios Papagiannis 1,2 , Giorgos Xanthakis 1,2 , Giorgos Saloustros 1 , Manolis Marazakis 1 , and Angelos Bilas 1,2 Foundation for Research and Technology Hellas (FORTH) 1 &

Embedded systems: Memory Mapped I/O Memory mapped I/O is a method of performing input/output

EECS 373 Design of Microprocessor-Based Systems Memory-Mapped I/O Example Bus with Memory-Mapped

Demand Paging Code pages are stored in a memory-mapped file on the backing store some are

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

Isolated I/O lecture 20 Memory Mapped I/O In MARS simulator of MIPS, we uses syscall for I/O

Chapter 4 Mass Storage Devices Page 1 We Shall be Covering ... Usage of the mass storage

Memory Management Ideally programmers want memory that is large fast non

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Untethered lowRISC, Memory Mapped IO and TileLink/AXI Wei Song 27/07/2015 Time Line expected

Memory Mapped I/O Basic idea: map a part of a file (or other object) into your virtual

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science

Lecture notes for CS 433 - Chapter 4 11/7/2019 Chapter 5: Thread-Level Parallelism Part 1

Concept of RAMS Information System CERN/18 th Sept 2017/Workshop AIT Austrian Institute of

M/441 Current status 16 December 2010 Ofgem David Johnson Co-chair SMCG Report Group

Introduction to Seismic Essentials in Groningen 7.2 Steel Structures By Prof Milan Veljkovic

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

SIMILARITY SEARCH The Metric Space Approach Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal,

ESJ Public Meeting Technology August 29, 2018 Model Background Water Resources Model Over

Database Management Systems (CPTR 312) Preliminaries Me: Raheel Ahmad Ph.D., Southern

Sambuz

Useful Links

Newsletter

Mail Us