LightKV: A Cross Media Key Value Store with Persistent Memory to Cut - PowerPoint PPT Presentation

MSST '20 October 29-30, 2020 LightKV: A Cross Media Key Value Store with Persistent Memory to Cut Long Tail Latency Shukai Han, Dejun Jiang, Jin Xiong Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences

MSST '20 Outline ü Background & Motivation • Design • Evaluation • Conclusion 2

MSST '20 Key-Value Store • Key-Value (KV) stores are widely deployed in data centers. • KV stores are latency-critical applications. Applications with low latency requirements Workloads with a high percentage of small KV items [1] [1] Berk, SIGMETRICS '2012 3

MSST '20 Log-Structured Merge Tree (LSM-Tree) Immutable MemTable KV Pair MemTable 2.Write 1.WAL 3.Flush Level0 4.Compaction LOG SSTable Level1 4 KV data …… Level2 (sorted) …… …… Level k-1 4 Metedata (bloom filter, index, etc) Level k …… SSTable Structure 4

MSST '20 Limitations of Persistent KV Store Inefficient indexing for cross-media Immutable MemTable Read KV Pair MemTable 2.Write 1.WAL 3.Flush Level0 4.Compaction LOG SSTable Level1 Read ü On one hand, LSM-Tree adopts 4 skiplist to index in-memory data. …… Level2 ü On the other hand, LSM-Tree …… builds manifest files to record key …… Level k-1 4 range of each on-disk SSTable. Level k …… 5

MSST '20 Limitations of Persistent KV Store High write amplification Immutable MemTable KV Pair MemTable 2.Write 1.WAL 3.Flush Level0 4.Compaction LOG SSTable Level1 4 Write log and data transfer between …… Level2 levels increase write amplification. …… …… Level k-1 4 Write (read) amplification is defined as the ratio between the amount of data written to (read from) the underlying storage device and the amount of data requested by the user. Level k …… 6

MSST '20 Limitations of Persistent KV Store High write amplification 20 LevelDB 18 HyperLevelDB 16 14.64 RocksDB 14.41 14.01 13.57 13.28 write amplification 12.90 14 12.29 11.68 11.42 11.23 10X+ 10.95 10.90 10.75 12 10.60 10.56 10.54 10.47 10.14 10.16 9.72 9.43 9.38 9.22 8.80 8.69 10 7.80 7.78 6.84 8 6.30 5.16 The write amplification of LSM-Tree can reach 10x, and with the 6 continuous increase of data amount, the write amplification 4 continue to show an upward trend. 2 0 10G 20G 30G 40G 50G 60G 70G 80G 90G 100G 7

MSST '20 Limitations of Persistent KV Store Heavy tailed read latency under mixed workload ü We first warm up LevelDB with 100 GB data. ü We measure the average latency as well as 99th and 99.9th percentile read latencies every 10 seconds. t1: Run a mixed workload of randomly 28X reading 50 GB existing data and randomly inserting another 50 GB data. The maximum 99th and 99.9th percentile read latencies can reach 13 and 28 times than the average read latency. 8

MSST '20 Limitations of Persistent KV Store Heavy tailed read latency under mixed workload t2: Run read-only workload. After the compaction finishes, the read tail latency is significantly reduced. Reducing write amplification is not only helpful for reducing the total write amount of the disk, increasing system throughput, but also helping to reduce the read tail latency under mixed read and write loads. 9

MSST '20 Non-Volatile Memory • Non-Volatile Memories (NVMs) provide low latency and byte addressable features. • 3D XPoint, Phase Change Memory (PCM), and Resistive Memory (ReRAM) • The first PM product, Intel Optane DC Persistent Memory (PM), was announced [19] in April 2019. 2.The write latency of Optane DC PM 1.NVM can persist is close to DRAM, while its read data after power off latency is 3 to 4 times that of DRAM. 3.The write and read bandwidths of Optane DC PM are around 2GB/s and 6.5GB/s, which is about 1/8 and 1/4 that of DRAM separately. 10

MSST '20 Outline • Background & Motivation ü Design • Evaluation • Conclusion 11

MSST '20 LightKV System Overview DRAM index 1.Radix Hash Tree (RH-Tree) …… Persistent 2.Persistent Memory Write Buffer (PWB) flush Segment …… SSTable compaction SSD …… …… 3.Main Data Store Partition1 Partition2 Partition N 12

MSST '20 Challenges • How does Radix Hash tree index KV items across media? • How does Radix Hash tree balance performance and data growth? • How does Radix Hash tree conduct well-controlled data compaction to reduce write amplification? 13

MSST '20 Radix Hash Tree Structure [0,32] [128,255] … Prefix Search Tree [96,255] [0,64] [64,255] … … …… HashTable HashTable HashTable Prefix Search Tree signature cache kv offset pointer HashTable 4B*4=16B 4B*4=16B 8B*4=32B K K K K HashTable Bucket (64B) V V V V SSTable or Semgnet 14

MSST '20 RH-Tree split normal split IN1 IN1 [0,127] [0,63] [64,127] …… LN1 LN1 LN2 …… IN1 level split [0,3] IN1 [0,3] …… IN2 …… LN1 [0,127] [128,255] LN1 LN2 15

MSST '20 Linked hash leaf node stage2 stage3 stage1 link LN1 LN1 LN2 LN2 link index index DRAM index index LN1’ Segment1 Segment1 Segment2 Segment2 persist index LN1’ flush PM SSD SSTable SSTable 16

MSST '20 RH-Tree placement …… Prefix Search Tree DRAM …… …… …… …… …… …… Radix Hashing Tree Persistent …… Persistent Memory Write Buffer Segment …… SSTable …… …… SSD Main Data Store Partition2 Partition N Partition1 17

MSST '20 Partition-based data compaction S19(0) Compaction Size (CS) is 4  log N 1 S14 (0) S18(0) k S9 (0) S17 (0) S13(0) S4 (0) S16 (0) S20 (1) S8 (0) S12(0) S15 (1) S3 (0) S7 (0) S110) S15 (1) S2 (0) S6 (0) S10 (1) S10 (1) S10 (1) S5 (1) S1 (0) S5 (1) S5 (1) S5 (1) S21 (2) compaction t1 t2 t3 t4 t5 t6 18

MSST '20 Recovery Prefix Search Tree rebuild …… DRAM …… Radix Hashing Tree …… Persistent Persistent Memory Write Buffer Segment …… SSTable …… …… SSD Main Data Store Partition2 Partition N Partition1 19

MSST '20 Outline • Background & Motivation • Design ü Evaluation • Conclusion 20

MSST '20 Experiment Setup • System and hardware configuration – Two Intel Xeon Gold 5215 CPU (2.5GHZ), 64GB memory and one Intel DC P3700 SSD of 400GB. – CentOS Linux release 7.6.1810 with 4.18.8 kernel and use ext4 file system. • Compared systems Workload YCSB Workload Description – LevelDB 、 RocksDB A 50% reads and 50% updates – NoveLSM 、 SLM-DB B 95% reads and 5% updates C 100% reads D 95% reads for latest keys and 5% inserts • Workloads E 95% scan and 5% inserts – db_bench as microbenchmark F 50% reads and 50% read-modify-writes – YCSB as the actual workload 21

MSST '20 Reducing write amplification LightKV are reduced by 7.1x, 5.1x, 2.9x and 2.3x compared to that of LevelDB, RocksDB, NoveLSM, and SLM- DB respectively. When the total amount of written data increases, the write amplification of LightKV remains stable (e.g. from 1.6 to 1.8 when the data amount increases from 50 GB to 100 GB). 22

MSST '20 Basic Operations 13.5x, 8.3x, 5.0x, 4.0x 4.5x, 1.9x, 4.2x, 1.3x Thanks to the global index and partition compaction, LightKV can effectively reduce read-write amplification and improve read and write performance. 23

MSST '20 Basic Operations reduced by 24.3% and 13.2% The performance of LightKV in short range query is low. This is because it needs to search all SSTables in one or more partitions when performing a short range query. 24

MSST '20 Tail latency under read-write workload 99.9th:15.7x, 9.2x, 8.8x, 3.4x 99th:17.9x, 10.5x, 6.4x, 3.5x lower and stable Thanks to lower write amplification and global indexing, LightKV provides a lower and stable read and write tail latency. 25

MSST '20 Results with YCSB LightKV provides better throughput in simulating actual workloads. 26

MSST '20 Outline • Background & Motivation • Design • Evaluation ü Conclusion 27

MSST '20 Conclusion • LSM-Tree based on traditional storage devices faces problems such as read-write amplification • At the same time, the emergence of non-volatile memory provides opportunities and challenges for building efficient key-value storage systems • In this paper, we propose LightKV a cross media key-value store with persistent memory. LightKV effectively reduces the read-write amplification of the system by establishing a RH-Tree and adopting a column-based partition compaction. • The experiment results show that LightKV reduces write amplification by up to 8.1x and improves read performance by up to 9.2x. It also reduces read tail latency by up to 18.8x under read-write mixed workload. 28

MSST '20 THANK YOU ! Q & A Author Email: hanshukai@ict.ac.cn 29

MSST '20 Sensitivity analysis As the maximum number of partitions increases, the read and write performance of LightKV increases, but the NVM capacity consumption also increase. 30

MSST '20 Sensitivity analysis As the compaction size increases, the merging frequency is reduced, and the write amplification is reduced, which is beneficial to improve the write performance, but is not conducive to reading. 31

LightKV: A Cross Media Key Value Store with Persistent Memory to Cut - PowerPoint PPT Presentation

MSST '20 October 29-30, 2020 LightKV: A Cross Media Key Value Store with Persistent Memory to Cut Long Tail Latency Shukai Han, Dejun Jiang, Jin Xiong Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy

Sapporo Sapporo Namba Namba Shinjuku Shinjuku Store Store Store Store West Store West

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Hardware Support for ACID Transactions in Persistent Memory Arpit Joshi , Vijay Nagarajan, Marcelo

Persistent Handles: approaches Ralph Bhme, Samba Team, SerNet 2018-06-08 Outline Persistent

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Peeking Inside Peeking Inside Persistent storage modeled as a sequence of N blocks Persistent

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Persistent Homology: Persistence Modules Andrey Blinov 6 October 2017 Andrey Blinov Persistent

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Store Presentation And Design Store Presentation And Design Looking for qualified reading

Brand In Store Display Distrib tributi tion on Brasla Cosmetics Ayur Store e Images ges

IBS (protons at store) as part of APEX during April 12, 2012 Protons at store: contribution from

Antidot Training AFS@Store AFS@Store Introduction 2 Antidot solution for E-Commerce 3 What

Note Well Any submission to the IETF intended by the Contributor for publication as all or part of

Environmental Modeling and Decisions Interconnections and time scales Four Aspects:

Principles of Software Construction: Objects, Design, and Concurrency Case Studies in Data

Data collection and data quality Market Integrity and Transparency Department 3 rd Market

Cassandra Offline Analytics Dongqian Liu, Yi Liu 2017/05/02 Agenda Introduction Use Case

The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K.

Welcome! Todays Agenda: Introduction The Prefix Sum Parallel Sorting

Infrastructures for Cloud Computing and Big Data Global Data Storage Luca Foschini Academic

Sambuz

Useful Links

Newsletter

Mail Us