Adaptive Prefetching for Accelerating Read and Write in NVM-based - PowerPoint PPT Presentation

Adaptive Prefetching for Accelerating Read and Write in NVM-based File Systems Shengan Zheng , Hong Mei, Linpeng Huang, Yanyan Shen, Yanmin Zhu Department of Computer Science and Engineering Shanghai Jiao Tong University 1

NVMM File Systems • Non-Volatile Memory √ Non-Volatile √ Byte-addressable × Longer latency than DRAM × Lower bandwidth than DRAM • NVMM file systems – SCMFS, BPFS, PMFS – NOVA, SIMFS, HiNFS – Not adaptive to different file access patterns 2

Motivation • Higher performance • Faster read & write Faster Read Faster Write Bottlenecks Indirection of file inner structure NVM write latency Approaches Continuous file address space DRAM buffer Proposed by SIMFS (TOC '16) HiNFS (EuroSys '16) 3

Motivation – Faster Read • Bottleneck: locating pages with software routines • Continuous file address space • Normal access routine Metadata Data block 0 1 4 5 6 7 2 3 • Continuous file address space Metadata Data block 0 1 2 3 4 5 6 7 • Not suitable for out-of-place writes 4

Motivation – Faster Write Applications • Bottleneck: High write latency of NVM • DRAM write buffer NVMM File System • Perform write back on SYNC Lazy-Persistent Read Write Eager-Persistent DRAM Read Read Write Write • Introduces additional lookup overhead for read Memory Interface NVMM 5

Motivation – Merge • NVMM File system with the best performance: • Read with continuous file address space • Write with DRAM write buffer Normal Read Optimized Read Normal Write Optimized Write 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Locating Data Read Latency Write Latency 6

Motivation Faster Read Faster Write Bottlenecks Indirection of file inner structure NVM write latency Approaches Continuous file address space DRAM buffer Proposed by SIMFS (TOC '16) HiNFS (EuroSys '16) Can we merge them into one NVM-based file system intuitively? No. Reasons Not for out-of-place writes Additional lookup overhead Optimized write Slower read Optimized read Slower write 7

Goal • Merge the read and write Application optimization approaches DRAM into one file system Continuous File Write Buffer Address Space NVM Main Area Read Optimization Write Optimization 8

Challenges • Merge the read and write Application optimization approaches DRAM into one file system Continuous File – Adaptive optimization Write Buffer Address Space – Allocation overhead NVM – Consistency Main Area Read Optimization Write Optimization 9

Design – WARP Classifier Application • Classify read/write intensive File IO DRAM Read Intensive Write Intensive accesses to files WARP Continuous File Classifier – Opened with READ_ONLY or Write Buffer Address Space WRITE_ONLY flag Normal File IO NVM – Tagged with read/write-intensive WARP Read WARP Write by WARP benefit model Main Area • Assign to different acceleration approaches accordingly • WARP: W rite A nd R ead P refetch 10

Design – WARP benefit model Application • To choose the best acceleration File IO DRAM approach Read Intensive Write Intensive WARP Continuous File Classifier • WARP benefit model Write Buffer Metadata Cache Address Space – NVM characteristics Normal File IO NVM – File access patterns WARP Read WARP Write • Estimated access latency: Main Area Auxiliary Information T = N Read × L ReadLatency + S Read × V ReadBandwidth -1 + N Write ×L WriteLatency + S Write ×V WriteBandwidth -1 WARP Benefit Model 11

Design – Access Pattern Prediction Application • Prefetching File IO – Move the allocation steps out of DRAM Read Intensive Write Intensive critical path WARP Continuous File Classifier Write Buffer Metadata Cache – Pre-allocation before files are Address Space accessed Normal File IO NVM • Collect file access traces and WARP Read WARP Write patterns Main Area Auxiliary Information – Which: successor prediction Access Pattern Prediction WARP Benefit Model – How: access pattern prediction WARP Prefetching 12

Design – WARP Prefetching Application File IO • WARP Prefetching DRAM Read Intensive Write Intensive – Whenever a file is accessed, WARP Continuous File Classifier Write Buffer Metadata Cache prefetch the next. Address Space – File-based / process-based Normal File IO NVM WARP Read WARP Write • High overall performance Main Area Auxiliary Information • High prefetch accuracy Access Pattern Prediction WARP Benefit Model WARP Prefetching 13

Implementation – Granularity • Granularity not too small (cacheline, block) – Adjacent data blocks share similar access patterns – The size of metadata will be enlarged • Granularity not too big (file) – Optimization not precise – Additional optimization overhead • We choose 2MB as the granularity for our implementation 14

Implementation – Read Optimization • Frequently-read nodes or files opened with READ_ONLY File A RB Tree Root Radix Tree DRAM Buffer File B RB Tree Root Root A1 A2 B1 C2 C3 C4 • Background thread: warp_prefetch File C RB Tree Root – Allocate virtual address space for File A A1 A2 A3 A4 File B B1 B2 B3 B4 B5 B6 File C C1 C2 C3 C4 the whole file Kernel Virtual – Map each valid block of the node Address Space WARP Write A3 A4 B2 B3 B4 B5 WARP Read File A File B • After prefetching – Accessed directly and continuously through the page table entries via MMU – Handling out-of-place write: update the mapping address accordingly 15

Implementation – Read Optimization File A RB Tree Root Radix Tree DRAM Buffer File B RB Tree Root Root A1 A2 B1 C2 C3 C4 File C RB Tree Root File A A1 A2 A3 A4 File B B1 B2 B3 B4 B5 B6 File C C1 C2 C3 C4 Kernel Virtual Address Space WARP Write A3 A4 B2 B3 B4 B5 WARP Read File A File B 16

Implementation – Write Optimization • Frequently written nodes or files opened with WRITE_ONLY • Background thread: warp_prefetch – Radix tree for files File A RB Tree Root – Red-black tree for nodes Radix Tree DRAM Buffer File B RB Tree Root Root A1 A2 B1 C2 C3 C4 File C RB Tree Root • After prefetching File A A1 A2 A3 A4 File B B1 B2 B3 B4 B5 B6 File C C1 C2 C3 C4 – Writes are intercepted by DRAM write buffer Kernel Virtual Address Space WARP Write A3 A4 B2 B3 B4 B5 – Write back to NVM only when SYNC WARP Read File A File B 17

Implementation – Successor Prediction • Predict future node and file access • Inner-file prediction – The next node within the file to be accessed – Stored in the metadata of node block • Inter-file prediction – The next file to be accessed File A A1 A2 A3 A4 File B B1 B2 B3 B4 B5 B6 File C C1 C2 C3 C4 – Stored in the metadata of inode block Inter-file Prefetch • Prefetch both inner-file and inter-file successor Inner-file Prefetch 18

Implementation – WARP benefit model • Objective: minimizing the node’s overall I/O time between two consecutive checkpoints Checkpoint i Read Write Write Write Checkpoint i+1 8KB 8KB 64KB 64KB 64KB 64KB 64KB 64KB • T=2*(250+50)+8KB*(8GB/s) -1 +48*(250+500)+192KB*(2GB/s) -1 =134 μ s • T = [Read overhead] + [Write overhead] + [*Writeback overhead] �� • T = 𝑂 �� × 𝑀 �� + 𝑇 �� × 𝑊 + 𝑂 �� × 𝑀 �� + 𝑇 �� × 𝑊 �� • N: The number of access times L: The access latency of file inner structure and memory • S: The total size of the I/O access V : The transmission bandwidth of memory 19

Implementation – WARP benefit model • T(Read Opt.) �� = 𝑂 �� × 𝑀 �� + 𝑇 �� × 𝑊 + 𝑂 �� × 𝑀 �� + 𝑇 �� × 𝑊 �� Checkpoint i Read Write Write Write Checkpoint i+1 8KB 64KB 64KB 64KB • T=1*(100+50)+8KB*(8GB/s) -1 +48*(250+500)+192KB*(2GB/s) -1 =133 μ s 1 100 100 • Continuous file inner structure reduces the access latency 20

Implementation – WARP benefit model • T(Write Opt.) �� = 𝑂 �� × 𝑀 �� + 𝑇 �� × 𝑊 + 𝑂 �� × 𝑀 �� + 𝑇 �� × 𝑊 + 𝑈 �� Checkpoint i Read Write Write Write Checkpoint i+1 8KB 64KB 64KB 64KB • T=2*(250+15)+8KB*(15GB/s) -1 +48*(250+15)+192KB*(10GB/s) -1 15 15 15 15 15 15 10 10 + 16*(250+500)+64KB*(2GB/s) -1 =77 μ s 500 500 2 • Writes to DRAM instead of NVM • Write back to NVM on SYNC 21

Adaptive Prefetching for Accelerating Read and Write in NVM-based - PowerPoint PPT Presentation

Adaptive Prefetching for Accelerating Read and Write in NVM-based File Systems Shengan Zheng , Hong Mei, Linpeng Huang, Yanyan Shen, Yanmin Zhu Department of Computer Science and Engineering Shanghai Jiao Tong University 1 NVMM File Systems

1 Prefetching Implementations Recall Stream Buffer Diagram Sequential and stride prefetching

Prefetching Hyperlinks Prefetching Methods Prefetching Uncacheable/Dynamic Data

Read Write Inc. Phonics Parents Meeting Who is Read Write Inc. Phonics for? Read Write Inc.

Read Write Inc. Phonics MISS CASBAN About Read Write Inc Phonics

Collective Prefetching for Parallel I/O Systems Yong Chen and Philip C. Roth Oak Ridge National

COMP 590-154: Computer Architecture Prefetching Prefetching (1/3) Fetch block ahead of demand

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Quality-adaptive Prefetching for Interactive Branched Video using HTTP-based Adaptive Streaming

Read Write Inc. Phonics Parents Meeting Teach a child to read and keep that child reading

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Write Through No Write Allocate Cache Write Reference Check tag and index Yes Tag AND

Linux solution for prefetching necessary data during application and system startup Krzysztof

Graph Prefetching Using Data Structure Knowledge Sam Ainsworth and Timothy M. Jones Computer

3 rd Data Prefetching Championship June 23 rd , 2019 Held in conjunction with ISCA 2019 Seth

Disks and RAID Profs. Bracy and Van Renesse based on slides by Prof. Sirer 50 Years Old!

Building SSA Form Each use refers to exactly one name x 17 - 4 Whats hard? x a +

Last time System F K 1 is a kind K 2 is a kind -kind K 1 K 2 is a kind A :: K 1

Paul Laurain, Image des maths Institut Henri Poincar e, June 22th, 2018 Knots in S 3 and

Parallel Programming and Heterogeneous Computing Non-Uniform Memory Access Max Plauth, Sven

An SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile Hucheng

Theory of Computer Science B4. Predicate Logic I Gabriele R oger University of Basel March

Welcome! Todays Agenda: Recap Flow Control AVX, Larrabee, GPGPU Further