DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY PinK: High-speed In-storage Key-value Store with Bounded Tails Junsu Im , Jinwook Bae, Chanwoo Chung * , Arvind * , and Sungjin Lee Daegu Gyeongbuk Institute of Science & Technology (DGIST) *Massachusetts Institute of Technology (MIT) 2020 USENIX Annual T echnical Conference (ATC’ 20, July 15 ~ 17)
Key-Value Store is Everywhere! Key-Value store (KVS) has become a necessary infrastructure Algorithm Web indexing, Caching, Storage systems SILK (ATC’19) , Dostoevsky (SIGMOD’18) Monkey (SIGMOD’17) … System FlashStore (VLDB’10) Wisckey (FAST’16) LOCS (Eurosys’14) … Architecture Bluecache (VLDB’16) … 2
Key-Value (KV) Storage Device Web indexing, Caching, Storage systems Key-Value Interface Fewer Host Resources Host KVS Engine Low Latency High Throughput Block Device Driver KV-SSD Device Driver Block-SSD KV-SSD capacitior Offloading KVS functionality 3
Key-Value (KV) Storage Device Web indexing, Caching, Storage systems Key-Value Interface Fewer Host Resources Host KVS Engine Low Latency High Throughput Academia Block Device Driver KV-SSD Device Driver LightStore (ASPLOS’19), KV- SSD (SYSTOR’19), iLSM- SSD(MASCOTS’19) Block-SSD KV-SSD KAML (HPCA’17), NVMKV(ATC’15), Bluecache (VLDB’16) … Industry Offloading KVS Samsung’s KV -SSD functionality 4
Key Challenges of Designing KV-SSD 1. Limited DRAM resource SSDs usually have DRAM as much as 0.1% of NAND for indexing! Logical block: 4KB > KV-pair: 1KB on average DRAM DRAM 1KB 4KB DRAM Scalability NAND Scalability DRAM scalability slower than NAND! 1.13x / year 1.43x / year Technology and Cost Trends at Advanced Nodes, 2020, https://semiwiki.com/wp-content/uploads/2020/03/Lithovision-2020.pdf 5
Key Challenges of Designing KV-SSD (Cont.) 2. Limited CPU performance SSDs have low power CPU (ARM based) ARM CPU x86 CPU Which algorithm is better for KV-SSD with these limitations, Hash or Log-structured Merge-tree (LSM-tree) ? 6
Experiments using Hash-based KV-SSD Samsung KV-SSD prototype hash-based KV-SSD* Benchmark KV-SSD: KVBench**, Long tail latency Performance drop 32B key and 1KB value read request Block-SSD: FIO, / / / / / / 1KB read request What is the reason? 7 *KV-PM983, **Samsung KV-SSD benchmark tool
Problem of Hash-based KV-SSD SSD: 4TB, DRAM:4GB Key: 32B, Value: 1KB Hash bucket Full key (32B) Pointer to value (4B) Value 144GB >> 4GB KAML ( HPCA’17 ) Pointer to KV (4B) Signature (2B) Full key and Value 24GB > 4GB Flashstore (VLDB’10) 8
Problem of Hash-based KV-SSD Get ( key 7 ) Bucket 10 Signature: 1000 Hash Function LRU Cache Performance Drop Cache miss cached hash buckets Flash Access Bucket Bucket Bucket Bucket 5 Long tail latency Signature Signature Signature Signature Ptr Ptr Ptr Ptr probing 1000 1000 1000 2000 Signature Collision 1001 1001 1001 2001 Read other KV-pair 1002 1002 1002 2002 1003 1003 1003 2003 DRAM Flash Bucket Bucket Bucket Bucket Bucket Bucket Bucket Bucket 9 Bucket 10 Signature Signature Signature Signature Signature Ptr Signature Ptr Signature Ptr Signature Ptr Signature Ptr Ptr Ptr Ptr Ptr 1000 1000 1000 1000 1000 1000 1000 1004 1000 KEY:16 , Value KEY: 10, Value KEY: 7,Value 1001 1001 1001 1001 1001 1001 1001 1005 1001 Key is not 7 1002 1002 1002 1002 1002 1002 1002 1006 1002 Key is not 7 1003 1003 1003 1003 1003 1003 1003 1007 1003 In-flash hash buckets 9
LSM-tree? Another Option “LSM - tree” Low DRAM requirement No collision Easy to serve range query Is the LSM-tree really good enough? 10
Problem of LSM-tree-based KV-SSD 1. Long tail latency! In the worst case, h-1 flash accesses for 1 KV ( h = height of LSM-tree) Get ( key 7 ) Level 2 Level 0: Memtable Level 1 Level h 0 f h ( 7 ) f h ( 7 ) f h ( 7 ) Bloom filter Bloom filter Bloom filter pass pass 4 15 20 pass … Indices Indices Indices Indices Indices Value Value Value Indices DRAM Flash 4 V 5 V V V 1 V 2 V 4 V 8 V 6 7 1 V 3 V V V 11 12 no key 7 : false positive no key 7 : false positive finally key 7 found 11
Problem of LSM-tree-based KV-SSD 2. CPU overhead! Merge sort in compaction Building bloom filters ARM CPU Level N Bloom filter 15 13 11 9 7 Level N+1 6 5 4 3 2 1 16 14 12 10 8 New Level N+1 3. I/O overhead! Compaction I/O added by LSM-tree 12
Experiments using LSM-tree-based KV-SSD Lightstore*: LSM-tree-based KV-SSD Key-value separation ( Wisckey** ) and Bloom filter ( Monkey*** ) Benchmark Lightstore: YCSB-LOAD and YCSB-C (Read only), 32B key and 1KB value Long tail latency Compaction time-breakdown YCSB-C 13 *ASPLOS’19, **FAST’16, ***SIGMOD’17
PinK : New LSM-tree-based KV-SSD Long tail latency? L0 L0 DRAM Using “ Level-pinning ” L1 Flash L1 L2 DRAM L2 CPU overhead? Flash L3 L3 “ No Bloom filter ” Bloom filter “ HW accelerator ” for compaction I/O overhead? Reducing compaction I/O Level N Level N+1 by level-pinning Level N+1 Optimizing GC by reinserting valid data to LSM-tree Level N Level N+1 Level N+1 14
Introduction PinK Overview of LSM-tree in PinK Bounding tail latency Memory requirement Reducing search overhead Reducing compaction I/O Reducing sorting time Experiments Conclusion
Overview of LSM-tree in PinK PinK is based on key-value separated LSM-tree Skiplist KV KV KV KV Level 0 Start key Level 1 2 23 Level 2 Level list (sorted array) … … … Level h-1 DRAM Flash Meta segment area Data segment area Address pointer 2 V K V K V K V 2 4 11 19 Meta segment Data segment Pointer to KV 16
Bounding Tail Latency PinK LSM-tree with bloom filter LSM-tree: # of Levels 5 GET GET Bloom filter … … L1 Binary search L1 Binary search In worst case, In worst case, 4 flash access! 1 flash access! L2 L2 Binary search Binary search Level list L3 Binary search Binary search L3 … … L4 L4 DRAM DRAM Flash Flash Memory usage? Meta segment … … 17
Memory Requirement 4TB SSD, 4GB DRAM (32B key, 1KB value) Total # of levels: 5 Skip list (L 0 ) 8MB KV KV KV KV L1 L2 Level list 432MB L3 3.5 GB < 4GB … L4 Only one flash access for indexing DRAM Flash 1 level: 1.47MB 2 levels: 68MB Meta segment 3 levels: 3.1GB … 4 levels: 144GB 18
Reducing Search Overhead Fractional cascading Binary search Binary search × T Binary search Binary search on overlapped range Binary search × T Binary search h Range pointer … … Binary search × T Binary search … … 𝑃(ℎ 2 log(𝑈)) 𝑃(ℎ log(𝑈)) search complexity is Burdensome! 19
Reducing Search Overhead Prefix Less compare overhead Cache efficient search Binary search Binary search “Prefix” and “range pointer” memory usage: about 10% of level list Binary search Prefix (4B) … Key (32B) Ptr (4B) Binary search on same prefix … … Binary search on keys 20
Reducing Compaction I/O PinK without level-pinning PinK with level-pinning Full Full Update level list Update level list 6 read & 6 write No read & write … … Burdensome! 1 2 3 5 6 9 1 2 3 5 6 9 DRAM Flash 1 3 1 3 capacitior 2 5 6 9 2 5 6 9 DRAM … … Flash 21
Reducing Sorting Time DRAM L n Flash Write DRAM or Flash 15 14 11 9 2 ARM CPU DRAM Key Comparator Read DRAM or Flash L n+1 (==, >, <) Flash 16 14 12 10 2 DRAM Flash L n Meta segment addresses New L n+1 Meta segment level list of L n+1 addresses New address for Meta segments PinK 22
PinK Summary Long tail latency? L0 L0 DRAM Using level-pinning L1 Flash L1 L2 DRAM L2 CPU overhead? Flash L3 L3 Removing Bloom filter Optimizing binary search Bloom filter Adopting HW accelerator ARM CPU I/O overhead? Reducing compaction I/O Optimizing GC by reinserting valid data to LSM-tree Please refer to the paper! 23
Introduction PinK Experiments Conclusion
Custom KV-SSD Prototype and Setup All algorithms for KV-SSD were implemented on ZCU102 board For fast experiments: 64GB SSD, 64 MB DRAM (0.1% of NAND capacity) Client Server KV-SSD platform Xilinx ZCU102 4GB DRAM Expansion Card Custom Connectors Xeon E5-2640 Flash Card (20 cores @ 2.4 GHz) 32GB DRAM Artix7 FPGA 10GbE Zynq Ultrascale+ SoC Raw NAND (Quad-core ARM Cortex-A53 Flash chips 10GbE NIC with FPGA) (256GB) 25
Benchmark Setup YCSB: 32B key, 1KB value Load A B C D E F R:W ratio 0:100 50:50 95:5 100:0 95:5 95:5 50:50(RMW) Query type Point Range read Point Request Latest Uniform Zipfian Zipfian distribution (Highest locality) Two phases Load: issue unique 44M KV pairs (44GB, 70% of total SSD) Run: issue 44M KV pairs following workload description 26
Recommend
More recommend