Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage - PowerPoint PPT Presentation

Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays Ziqi Fan, Fenggang Wu, Dongchul Park 1 , Jim Diehl, Doug Voigt 2 , and David H.C. Du University of Minnesota, 1 Intel, 2 HP Enterprise May 18, 2017 C enter for R esearch in I ntelligent S torage

Hardware evolution leads to software and system innovation! C enter for R esearch in I ntelligent S torage 2

The hardware evolution of non-volatile memory (NVRAM) 3D Xpoint NVDIMM STT-MRAM (By Intel and Micron) (By HPE) (By Everspin) ✓ Non-volatile ✓ Low power consumption ✓ Fast (close to DRAM) ✓ Byte addressable ✓ … C enter for R esearch in I ntelligent S torage 3

How to innovate our software and system to exploit NVRAM technologies? C enter for R esearch in I ntelligent S torage 4

Many Possible Ways Caching Systems Application Upgrade OS Optimization Design NVRAM-based caching systems to improve storage performance C enter for R esearch in I ntelligent S torage 5

Research Contributions Extend solid state drive lifespan 1 ………………..…………………. � H-ARC (in MSST 2014 [1]) ……………………….. � WRB (under TOS Major Revision) 2 Increase hard disk drive I/O throughput ..……………………….. � I/O-Cache (in MASCOTS 2015 [2]) 3 Improve disk array performance ………………………………….. � Hibachi (in MSST 2017 [3]) 4 Parallel File System Increase PFS checkpointing speed ………………………………….. � CDBB (Under Submission) 5 C enter for R esearch in I ntelligent S torage 6

A Cooperative Hybrid Cache with NVRAM and DRAM for Disk Arrays C enter for R esearch in I ntelligent S torage

Outline • Motivation • Related Work • Design Challenges • Our Approach • Evaluation • Conclusion C enter for R esearch in I ntelligent S torage 8

Introduction • Despite the rise of SSDs, disk arrays are still the backbone storage, especially for large data centers • HDDs are much cheaper in capacity/$ and do not wear out easily • However, as rotational devices – HDDs sequential throughput: ~100MB/s [4] – HDDs random throughput : < 1MB/s C enter for R esearch in I ntelligent S torage 9

Introduction • To improve disk performance, we use NVRAM and DRAM as caching devices – Disk cache is much larger than page cache and DRAM is more cost-effective than NVRAM – DRAM has lower latency than some types of NVRAM Crux: How to design a hybrid disk cache to fully utilize scarce NVRAM and DRAM resources? C enter for R esearch in I ntelligent S torage 10

Related Work • Cache policies designed for main memory (first-level cache) – Not directly applicable to disk cache – LRU, ARC[5], H-ARC [1] • Multilevel buffer cache (including both first-level and second- level caches) – Concentrate on improving read performance – Not considering NVRAM – MQ [6], Karma [7] • Disk cache with DRAM and NVRAM – DRAM as read cache and NVRAM as write buffer � lack cooperation C enter for R esearch in I ntelligent S torage 11

Design Challenges • How to analyze and utilize I/O traces after first-level cache to design disk cache as second-level cache? • How to utilize DRAM to maximize read performance? – Low access latency (high cache hit rate) • How to utilize NVRAM to maximize write performance? – High I/O throughput • How to exploit the synergy of both NVRAM and DRAM? – Help each other out according to workload properties C enter for R esearch in I ntelligent S torage 12

I/O Workload Characterization of Traces after First-level Cache • Existing work only characterizes read requests [10] • On top of existing work, we characterize both read and write requests Temporal distance histograms of a storage server I/O workload. ✓ For read requests, stack distance is large -> recency is bad ✓ For write requests, stack distance is relatively short -> recency can be useful for cache design ✓ Frequency is useful for both read and write C enter for R esearch in I ntelligent S torage 13

Hibachi – Cooperative Hybrid Disk Cache • Our Hibachi’s four secret ingredients to make it “taste better” – Right Prediction � Improve cache hit ratio – Right Reaction � Minimize write traffic and increase read performance – Right Adjustment � Adaptive to workload – Right Transformation � Improve I/O throughput C enter for R esearch in I ntelligent S torage 14

Evaluation Setup • Use Sim-ideal [9] to measure read performance • Use software RAID with six disk drives to measure write performance • Comparison algorithms: – Hybrid-LRU: DRAM is a clean cache for clean pages, and NVRAM is a write buffer for dirty pages. Both caches use the LRU policy. – Hybrid-ARC: An ARC-like algorithm to dynamically split NVRAM to cache both clean pages and dirty pages, while DRAM is a clean cache for clean pages. C enter for R esearch in I ntelligent S torage 15

Evaluation Results Hybrid-LRU Hybrid-ARC Hibachi Hybrid-LRU Hybrid-ARC Hibachi 25.00% 20000 18000 20.00% 16000 14000 Throughput in KB/s Read Hit Rate 15.00% 12000 10000 10.00% 8000 6000 5.00% 4000 2000 0.00% 0 8MB 16MB 32MB 64MB 128MB 256MB 8MB 16MB 32MB 64MB 128MB 256MB Total Cache Size Total Cache Size • Hibachi outperforms Hybrid-LRU and Hybrid-ARC in – Read hit ratio – Write hit ratio – I/O throughput C enter for R esearch in I ntelligent S torage 16

Conclusion • NVRAM as caching is a challenging and rewarding research topic • We design Hibachi – a hybrid NVRAM and DRAM cache for disk arrays – Characterize storage-level workload to get design guidance – Our four features make Hibachi standing out • Hibachi outperforms existing work in both read and write C enter for R esearch in I ntelligent S torage 17

References (1/2) • [1] Z. Fan, D. H. C. Du and D. Voigt, "H-ARC: A non-volatile memory based cache policy for solid state drives," 2014 30th Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, 2014, pp. 1-11. • [2] Z. Fan, A. Haghdoost, D. H. C. Du and D. Voigt, "I/O-Cache: A Non-volatile Memory Based Buffer Cache Policy to Improve Storage Performance," 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Atlanta, GA, 2015, pp. 102-111. • [3] Z. Fan, F. Wu, D. Park, J. Diehl, D. Voigt and D. H. C. Du , "Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays," 2017 33rd Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, 2017, pp. 1-11. • [4] Figure from https://technet.microsoft.com/enus/enus/library/dd758814(v=sql.100).aspx • [5] N. Megiddo and D. S. Modha, "Outperforming LRU with an adaptive replacement cache algorithm," in Computer, vol. 37, no. 4, pp. 58-65, April 2004. C enter for R esearch in I ntelligent S torage 18

References (2/2) • [6] Y. Zhou, Z. Chen, and K. Li, “Second - level buffer cache management,” IEEE Trans. Parallel Distrib. Syst., vol. 15, pp. 505 – 519, June 2004. • [7] G. Yadgar , M. Factor, and A. Schuster, “Karma: Know -it-all replacement for a multilevel cache,” in Proceedings of the 5th USENIX Conference on File and Storage Technologies, FAST ’07, (Berkeley, CA, USA), pp. 25 – 25, USENIX Association, 2007. • [8] M. Woods, “Optimizing storage performance and cost with intelligent caching,” tech. rep., NetApp, August 2010. • [9] Sim-ideal. git@github.com:arh/sim-ideal.git • [10] Y. Zhou, Z. Chen, and K. Li, “Second - level buffer cache management,” IEEE Trans. Parallel Distrib. Syst., vol. 15, pp. 505 – 519, June 2004. C enter for R esearch in I ntelligent S torage 19

Questions? C enter for R esearch in I ntelligent S torage 20

Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage - PowerPoint PPT Presentation

Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays Ziqi Fan, Fenggang Wu, Dongchul Park 1 , Jim Diehl, Doug Voigt 2 , and David H.C. Du University of Minnesota, 1 Intel, 2 HP Enterprise May 18, 2017 C enter for R esearch in

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao

1 Issues for Cache Hierarchies Issues for Cache Hierarchies Hashing: Cache Array Routing

Virtual Memory Lecture 25 CS301 DRAM as cache What about programs larger than DRAM?

Lecture 21: Memory Hierarchy Todays topics: Cache organization Cache hits/misses 1

Tutorial 9 : cache memory Why use a cache ? Main memory (VRAM/DRAM) is slow ! To deal with

P age 1 Review: Cache perf ormance What are all the aspects of cache organization that impact

1 Direct-mapped cache Tags and validation stickers Address (showing bit positions) 31 30 13 12

The Virtual Write Queue: Coordinating DRAM and Last-Level Cache Policies Jeffrey Stuecheli 1,2 ,

Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan

NOW Handout Page 1 1 KB Direct Mapped Cache, 32B blocks Review: Set Associative Cache For a 2

HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory Xiaomin Zou 1 , Fang Wang 1 *,

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Stacked DRAM: The Hybrid Memory Cube Manuel Ujaldon Computer Architecture Department

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

KPart: A Hybrid Cache Sharing-Partitioning Technique for Commodity Multicores Nosayba EI-Sayed

St orage Hierarchy 10: St orage and File Syst em Regist ers Basics L1 Cache Fast er, Smaller,

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Hybrid cache architecture for high-speed packet processing Z. Liu, K. Zheng and B. Liu Abstract:

Cache and Syphilis RootedCON 2019 Haswell (4th generation) architecture Cache latencies:

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Csci 5980 Spring 2020 New Storage Technologies/D evices Higher performan Tape SMR HDD SSD

MAXIMIZING CACHE PERFORMANCE UNDER UNCERTAINTY Nathan Beckmann Daniel Sanchez CMU MIT