A Rack-scale Key-value Store for Flash Storage and RDMA Michalis Vardoulakis 1,2,* , Giorgos Saloustros 1 , Pilar González-Férez 3 , and Angelos Bilas 1,2 * mvard@csd.uoc.gr 1 Computer Architecture and VLSI Laboratory, Institute of Computer Science, Foundation for Research and Technology – Hellas, Greece 2 Computer Science Department, University of Crete, Greece 3 Department of Computer Engineering, University of Murcia, Spain
Motivation • Data doubles every ~2years • Datacenter processing capacity limited by energy • Datacenter power limited by current technology • Conclusion -> Have to process increasing amounts of data with the same infrastructure
Who uses key-value stores? • Facebook uses RocksDB to store user information • Amazon uses Dynamo to store each user's shopping cart • Stack Overflow uses Redis as a cache for their datacenters • Netflix uses EVCache to cache frequently used data on AWS EC2 • Also used in machine learning pipelines, big data analytics and web application backends
Why do they use key-value stores? • Consistency • Availability • Scalability
What is a key-value store? • Think of it as a hash table • Data are represented as key-value pairs L 1 • Data are unstructured L 0 • Keys and values do not have set types • get(k), put(k, v), update(k, v), delete(k) KV KV KV KV KV KV KV SSD Memory Design of Kreon, the key-value storage used in this work
Design • We design for SSDs – don't have to sort key-value pairs to achieve sequential access patterns for I/O • We use RDMA to reduce CPU cycles spent on client – server communication and replication
Remote Direct Memory Access (RDMA) • Allows a process on one computer to access the memory of the other • All communication is user-space (no context switching) • One-sided communication, the other computer doesn't spend CPU cycles • TCP/IP -> Send/Receive • RDMA -> Read/Write remote memory
RDMA Hardware & Software Data movement in RDMA vs socket-based protocols Network stack in Linux
Challenges • Scaling RDMA Write to support hundreds of active clients • Replication with low performance overhead
Scaling RDMA Write Constraints: • Memory used for RDMA operations is pinned to physical pages • Have to work within physical memory constraints Solution: • Each server allocates a limited number of memory buffers • A server splits single buffers between clients if he runs out of available memory buffers
Replication • Primary-backup replication scheme • Append key-values to a log, without adding them to the tree index • Use RDMA write to append them to the log • The replica only has to periodically flush the last part of the log to ensure fault tolerance • Efficient L 1 index construction in backup using hints from primary
Thank yo you for yo your attentio ion! Any y ques estio ions?
Recommend
More recommend