Storm: a fast transactional dataplane for remote data structures - PowerPoint PPT Presentation

Storm: a fast transactional dataplane for remote data structures Stanko Novakovic Yizhou Shan Aasheesh Kolli Michael Cui Yiying Zhang Haggai Eran Boris Pismenny Liran Liss Michael Wei Dan Tsafrir Marcos Aguilera 12 th ACM International Systems and Storage Conference (SYSTOR)

What is Remote Direct Memory Access (RDMA)? • Initiate transfer, hardware executes, async. poll for completions • Infiniband (IB): specialized network stack for RDMA • Fully implemented in hardware (PCIe-based adapters) → • Also: IB transport on top of IP and lossless Ethernet • Key benefits: 1. one-sided access 2. user-level w/ minimal instr. footprint 2

Remote data structures • Hash tables, graphs, trees, queues, etc • Fine-grain accesses • High fan-out • Pointer-linked • Transactional access • Throughput (IOPS) bound • Latency Service Level Objective (SLO) • Other (perhaps less interesting) use cases: analytics, VM migration • Bulk transfers, bandwidth-bound 3

What are common concerns? 1. Scalability: network state kept in limited hardware resources WQEs cache S Q PCI/DMA Infiniba nd or ETH R Q Addr. translation rNIC core CQ DDIO Connection state Cache CP U Protection DR AM 2. Round-trips: pointer-linked data structures 4

What are common concerns? 1. Scalability: network state kept in limited hardware resources • FARM: Use locks to share QP connections (Dragojevic’14) • FaSST/eRPC : Don’t use connections (Kalia’19) • LITE: Enforce protection in kernel (Tsai’17) 2. Round-trips: pointer-linked data structures • FARM: Use Hopscotch algorithm, one RTT common case • FaSST/eRPC: Leverage RPCs rather than one-sided reads 5

Outline • Problem statement • Key insights • Storm design • Results 6

Key insights (1/2) • Hardware has gotten much better!!! • ConnectX-4/5 (CX4/5) vs. ConnectX-3 (CX3) • 40M IOPS on CX4 → 4x higher than CX3 • Scales up to 64 machines → on CX3 IOPS collapses for >10 machines • CX4 achieves 10M IOPS when zero cache hits → max IOPS for uncontended CX3 • Break-even point with datagram send/recv currently at ~4k connections • Possible further improvements with ConnectX-6 • How is HW getting better? • More concurrency, better prefetching, larger caches, etc 7

Key insights (2/2) • FARM: • Locks degrade throughput unnecessarily • Large buckets (due to larger keys) wastes throughput • FaSST/eRPC: • Two- sided doesn’t allow for maximum full -duplex throughput • Especially for requests larger than a cache line (no inlining) • Onloaded congestion control adds overhead • LITE: • Kernel adds overhead (fine-grain accesses) • No support for async. operations 8

Our approach / Storm design principles 1. Use connections but minimal count • Lock-free QP sharing if really necessary • Offloaded congestion control and retransmissions 2. Use one-sided reads whenever possible • First one-sided, then RPC ( one-two-sided ) • RPC also implemented using one-sided writes 3. Leverage abundant memory • Cache metadata and/or reduce collisions in hash tables 4. Minimize translation & protection state • Use contiguous physical allocation 5. And don’t forget to deploy on new hardware!!! 9

Storm design HW rNIC rNIC MEM CPU MEM CPU SW Storm dataplane Storm dataplane RPC RPC QP & QP & Event Event buffer buffer loop loop mngmnt mngmnt RR RR Data structure Data structure impl. & metadata impl. & metadata Division of responsibilities: • Storm DP only understands RDMA connections and memory regions • Data structure understands data layout and implements metadata caching 10

Two-sided operations HW rNIC rNIC MEM CPU MEM CPU SW Storm dataplane Storm dataplane RPC RPC op() ev_loop() QP & QP & Event Event buffer buffer loop loop mngmnt mngmnt ev_loop() RR RR 3 2 1 success fail success Data structure Data structure impl. & metadata impl. & metadata 11

One-sided operations HW rNIC rNIC MEM CPU MEM CPU SW Storm dataplane Storm dataplane RPC RPC op() 2 QP & QP & Event Event buffer buffer loop loop mngmnt mngmnt ev_loop() RR RR success success 3 1 Data structure Data structure impl. & metadata impl. & metadata 12

One-two-sided operations HW rNIC rNIC MEM CPU MEM CPU SW Storm dataplane Storm dataplane RPC RPC op() 2 QP & QP & Event Event buffer buffer loop loop mngmnt mngmnt ev_loop() RR RR success fail 3 1 Data structure Data structure impl. & metadata impl. & metadata 13

One-two-sided operations HW rNIC rNIC MEM CPU MEM CPU SW Storm dataplane Storm dataplane RPC RPC op() ev_loop() QP & QP & Event Event buffer buffer loop loop mngmnt mngmnt ev_loop() RR RR 5 4 3 success fail success Data structure Data structure impl. & metadata impl. & metadata 14

Distributed transactions HW rNIC rNIC MEM CPU MEM CPU Storm dataplane Storm dataplane RPC RPC QP & QP & Event Event TX TX buffer buffer loop loop mngmnt mngmnt RR RR Data structure Data structure SW impl. & metadata impl. & metadata Support for concurrent data structures using transactions 15

Data structure API (three callbacks) • RPC handler • Processing two-sided communication • Implements complex paths, such as acquiring locks and commits • Lookup start • Check if address is known (cached) or we can guess • If yes, leverage RDMA read • Lookup end • Check if data is valid and cache for future use 16

Storm implementation & exp. setup • 13k LOC of C++, w/o MICA modifications [Lim’14] • HPC cluster w/ 32 Dell machines • High-speed Infiniband network (100Gbps) • Mellanox ConnectX-4 – similar in perf to CX5 • Emulation of 3-4x larger clusters possible on Storm • Benchmarks: • Key-value transactional micro-benchmark • Telecommunication Application Transaction Processing (TATP) 17

Outline • Problem statement • Key insights • Storm design • Results 18

Baselines • Emulated FARM (modified: Lock-free_FaRM) • No connection sharing, 1KB “neighborhoods” • eRPC • With and without active congestion control • LITE (modified: Async_LITE) • Added support for asynchronous operations 19

Storm results • Single-lookup workload • 128B KV pairs, 100M items, 20 threads per mn Storm (cache) Per-mn lookups / usec 50 40 30 20 10 0 4 8 12 16 20 24 28 32 Number of machines 20

Storm results • Single-lookup workload • 128B KV pairs, 100M items, 20 threads per mn Storm(oversub) eRPC (w/o CC) eRPC Lock-free FARM Storm (cache) Storm (oversub) Async_LITE (projected) Per-mn lookups / usec 50 Per-mn lookups / usec 50 40 40 30 30 one-two-sided operations 20 20 10 10 0 0 4 8 12 16 20 24 28 32 4 8 12 16 Number of physical machines Number of machines • TATP: 11.8 million per node with Storm (oversub) 21

Does Storm scale well? • Storm scales well up to 64mn Storm(cache)-20x Storm(cache)-10x Per-mn lookups / usec • Reduce thread count by 2x 50 • 2x fewer threads → 2x fewer QPs 40 30 20 • Do we need more than 10 threads? 10 0 • Lock-free QP sharing 32 64 96 128 Number of emulated machines 22

Conclusion & future work • RDMA datacenter users should get a hardware upgrade • More scalable hardware available • Take advantage of one-sided primitives • Leverage caching and oversubscription (in hash tables) • One-sided read in the common case • Ongoing research threads: • Designing “far” memory data structures (HotOS’19) • Memory allocator for repurposing unused memory • Lock-free mechanisms for QP sharing 23

Storm: a fast transactional dataplane for remote data structures - PowerPoint PPT Presentation

Storm: a fast transactional dataplane for remote data structures Stanko Novakovic Yizhou Shan Aasheesh Kolli Michael Cui Yiying Zhang Haggai Eran Boris Pismenny Liran Liss Michael Wei Dan Tsafrir Marcos Aguilera 12 th ACM

A Look at Intels Dataplane Development Kit Dominik Scholz Chair for Network Architectures and

Welcome to Storm ! The Storm botnet Reachability check Overnet (UDP) The Storm botnet

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Tow n of Moraga Storm Drain O&M Program Developing a Storm Drain GIS Outline Part I

Communicating Storm Surge: Lessons Communicating Storm Surge: Lessons Learned during Isaac, Irene

ACTO TON STORM TANKS Community Liaison Working Group Tuesday 4 February 2020 Acton Storm Tanks

Apache Storm Christopher Little Apache Storm Alternatives Storm Hadoop Spark Streaming

Household Sewage Treatment Systems (HSTS) on Storm Water Pollution Storm Water Defined Water

Richard F. Dick Storm CEO Storm Technologies Inc CEO, Storm Technologies, Inc. Sammy

Construction Storm Water Construction Storm Water Construction Storm Water - - 10 Most

Transactional memory with data Transactional memory with data invariants: or putting the

IX: A Protected Dataplane Operating System Problem Context The requirements of modern data

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

DTCP + Remote Access Proposal for Discussion with 3S October 28, 2009 1 Remote Access (RA)

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

Random Local Exploration Techniques for Sublinear-Time Algorithms Krzysztof Onak IBM Research

r t r r

Correlations between Parallel Patterns and Multi-core Benchmarks Vivek Kale IWMSE workshop May

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

What is a Choreography? A choreography is a way to organize a multiparty web application in a

Performance Isolation Anomalies in RDMA Yiwen Zhang with Juncheng Gu, Youngmoon Lee, Mosharaf

Engineering Multiagent Systems for Ethics and Privacy-Aware Social Computing Nirav Ajmeri (Under

Sambuz

Useful Links

Newsletter

Mail Us

Storm: a fast transactional dataplane for remote data structures - PowerPoint PPT Presentation

Storm: a fast transactional dataplane for remote data structures Stanko Novakovic Yizhou Shan Aasheesh Kolli Michael Cui Yiying Zhang Haggai Eran Boris Pismenny Liran Liss Michael Wei Dan Tsafrir Marcos Aguilera 12 th ACM

A Look at Intels Dataplane Development Kit Dominik Scholz Chair for Network Architectures and

Welcome to Storm ! The Storm botnet Reachability check Overnet (UDP) The Storm botnet

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Tow n of Moraga Storm Drain O&amp;M Program Developing a Storm Drain GIS Outline Part I

Communicating Storm Surge: Lessons Communicating Storm Surge: Lessons Learned during Isaac, Irene

ACTO TON STORM TANKS Community Liaison Working Group Tuesday 4 February 2020 Acton Storm Tanks

Apache Storm Christopher Little Apache Storm Alternatives Storm Hadoop Spark Streaming

Household Sewage Treatment Systems (HSTS) on Storm Water Pollution Storm Water Defined Water

Richard F. Dick Storm CEO Storm Technologies Inc CEO, Storm Technologies, Inc. Sammy

Construction Storm Water Construction Storm Water Construction Storm Water - - 10 Most

Transactional memory with data Transactional memory with data invariants: or putting the

IX: A Protected Dataplane Operating System Problem Context The requirements of modern data

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

DTCP + Remote Access Proposal for Discussion with 3S October 28, 2009 1 Remote Access (RA)

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

Random Local Exploration Techniques for Sublinear-Time Algorithms Krzysztof Onak IBM Research

r t r r

Correlations between Parallel Patterns and Multi-core Benchmarks Vivek Kale IWMSE workshop May

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

What is a Choreography? A choreography is a way to organize a multiparty web application in a

Performance Isolation Anomalies in RDMA Yiwen Zhang with Juncheng Gu, Youngmoon Lee, Mosharaf

Engineering Multiagent Systems for Ethics and Privacy-Aware Social Computing Nirav Ajmeri (Under

Sambuz

Useful Links

Newsletter

Mail Us

Tow n of Moraga Storm Drain O&M Program Developing a Storm Drain GIS Outline Part I