HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory Xiaomin Zou 1 , Fang Wang 1 *, Dan Feng 1 , Janxi Chen 1 , Chaojie Liu 1 , Fan Li 1 , Nan Su 2 Huazhong University of Science and Technology 1 , China Shandong Massive Information Technology Research Institute 2 , China
Outline • Background and motivation • Our Work: HMEH • Performance Evaluation • Conclusion
Background : Non-Volatile Memory (NVM) NVM is expected to complement or replace DRAM as main memory CPU non-volatile large capacity Cache hierarchy high performance low standby power limited write endurance asymmetric properties Intel Optane DC Persistent Memory 3
Background : NVM-based hash structures Hashing structures are widely used in storage systems main memory database in-cache index in-memory key-value store Previous work is insufficient for real NVM device PFHT [INFLOW 2015] Path hashing [MSST 2017] Level hashing [OSDI 2018] CCEH [FAST 2019] 4
Motivation : The design of hashing structure Static hashing structure vs Dynamic hashing structure Static hashing: Cost inefficiency for resizing hash table Dynamic hashing: need extra directory access and the read latency of optane DCPMM is higher 000 &val3 hash(key) Directory 00 2 01 2 10 2 11 2 Buckets rehash all items Static hashing structure dynamic hashing structure 5
Motivation : High overhead for data consistency Data consistency guarantee The volatile/non-volatile boundary is between CPU cache and NVM Arbitrarily- evicted cache lines → memory writes reordering Program reordering CPU St value; cache St key; volatile Non-volatile ② ① CPU key value 6
Motivation : High overhead for data consistency Data consistency guarantee The volatile/non-volatile boundary is between CPU cache and NVM Arbitrarily- evicted cache lines → memory writes reordering Program reordering CPU St value; cache reordered value St key; volatile Non-volatile CPU key 11 7
Motivation : High overhead for data consistency Data consistency guarantee The volatile/non-volatile boundary is between CPU cache and NVM Arbitrarily- evicted cache lines → memory writes reordering Program reordering CPU St value; cache reordered value St key; volatile Non-volatile Crash CPU Inconsistency key 11 8
Motivation : High overhead for data consistency Data consistency guarantee The volatile/non-volatile boundary is between CPU cache and NVM Arbitrarily- evicted cache lines → memory writes reordering Flush: flush cache lines Fence: order CPU cache line flush Program reordering CPU cache St value; volatile Fence(); St key; Non-volatile Flush() CPU key value 11 9
Motivation : High overhead for data consistency Data consistency guarantee The volatile/non-volatile boundary is between CPU cache and NVM Arbitrarily- evicted cache lines → memory writes reordering Flush: flush cache lines Fence: order CPU cache line flush Expensive ! Program reordering CPU cache St value; volatile Fence(); St key; Non-volatile Flush() CPU key value 11 1 0
Motivation : High overhead for data consistency Data consistency guarantee the evaluation with/without Fence and Flush in optane DCPMM CCEH[FAST 2019], LEVL[OSDI 2018], linear hashing, and cuckoo hashing without Fence and Flush instructions , the throughputs of these hashing schemes are improved by 20.3% to 29.1% Our goals high-performance dynamic hashing with low data consistency overhead and fast recovery 1 1
Our Scheme: HMEH HMEH: Extendible Hashing for Hybrid DRAM-NVM Memory Flat-structured Directory for fast access and radix-tree Directory for recovery Directory → segment → cacheline-sized bucket DRAM NVM Segment Hash key 00 10 Bucket 00 0000 &val4 Bucket 01 0001 &val2 Segment index Bucket index Bucket 10 0010 &val6 Bucket 11 radix-tree Directory Flat-structured Directory Segment 1100 &val0 Bucket 00 1101 &val8 Bucket 01 Bucket 10 1110 &val9 Bucket 11 12
HMEH : Two directories Flat-structured Directory VS Radix-tree Directory Radix tree is friendly to NVM exploit RT-directory to rebuild FS-directory upon recovery every segment is pointed by 2 G-L directory entries 0 1 Local depth : 1 0 1 2 0 1 3 0 1 0 1 0 1 Global depth : 3 000 001 010 011 100 101 110 111 13
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU key value Program reordering cache St value; volatile Fence(); Non-volatile St key; CPU Flush() 14
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU Program reordering cache key value St value; volatile Fence(); Non-volatile St key; CPU Flush() 15
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU Program reordering cache K1 K2 V1 V2 St value; volatile Fence(); Non-volatile St key; CPU Flush() 16
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU Program reordering cache K1 V1 K2 V2 St value; volatile Fence(); Non-volatile St key; CPU Flush() 17
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU Program reordering cache K2 V2 St value; volatile Fence(); Non-volatile St key; CPU K1 V1 Flush() 18
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU Program reordering cache K2 V2 St value; volatile Fence(); Non-volatile Crash √ St key; CPU V1 K1 Flush() 19
HMEH : Low data consistency overhead Cross-KV mechanism Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks Avoid lots of Flush and Fence instructions CPU Program reordering cache K2 V2 St value; volatile Fence(); St cross-KVs Non-volatile Crash √ St key; CPU V1 K1 Flush() 20
HMEH : Improve load factor Resolve hash collisions linear probing : allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM) stash: non-addressable and used to store colliding items 11 01 00 2 01 2 10 2 11 2 Hash key Segment Segment Segment Bucket 00 Bucket 00 Bucket 00 0000 &val4 1000 &val4 Bucket 01 Bucket 01 Bucket 01 0101 &val2 1001 &val2 1101 &val2 Bucket 10 Bucket 10 Bucket 10 0010 &val6 1010 &val6 Bucket 11 Bucket 11 Bucket 11 stash stash stash 21
HMEH : Improve load factor Resolve hash collisions linear probing : allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM) stash: non-addressable and used to store colliding items 11 01 00 2 01 2 10 2 11 2 Hash key Segment Segment Segment Bucket 00 Bucket 00 Bucket 00 0000 &val4 1000 &val4 Bucket 01 Bucket 01 Bucket 01 0101 &val2 1001 &val2 1101 &val2 Bucket 10 Bucket 10 Bucket 10 0010 &val6 1010 &val6 Bucket 11 Bucket 11 Bucket 11 stash stash stash 22
HMEH : Improve load factor Resolve hash collisions linear probing : allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM) stash: non-addressable and used to store colliding items 11 01 00 2 01 2 10 2 11 2 Hash key Segment Segment Segment Bucket 00 Bucket 00 Bucket 00 0000 &val4 1000 &val4 Bucket 01 Bucket 01 Bucket 01 0101 &val2 1001 &val2 1101 &val2 Bucket 10 Bucket 10 Bucket 10 0010 &val6 1010 &val6 Bucket 11 Bucket 11 Bucket 11 stash stash stash 23
HMEH : Improve load factor Resolve hash collisions linear probing : allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM) stash: non-addressable and used to store colliding items 11 01 00 2 01 2 10 2 11 2 Hash key Segment Segment Segment Bucket 00 Bucket 00 Bucket 00 0000 &val4 1000 &val4 Bucket 01 Bucket 01 Bucket 01 0101 &val2 1001 &val2 1101 &val2 Bucket 10 Bucket 10 Bucket 10 0010 &val6 1010 &val6 Bucket 11 Bucket 11 Bucket 11 stash stash stash 24
HMEH : Optimistic Concurrency Mutex and version number for directories Directories Fine-grained lock for segment split lock-free read Segment Segment Bucket 00 0000 &val4 1100 &val0 Bucket 00 Bucket 01 Bucket 01 0001 &val2 1101 &val8 Bucket 10 Bucket 10 1110 &val9 0010 &val6 Compare-and-swap Bucket 11 Bucket 11 Instructions for Slots 25
Recommend
More recommend