FaSST: Fast, Scalable, and Simple Distributed Transactions with - PowerPoint PPT Presentation

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU) Michael Kaminsky (Intel Labs) David Andersen (CMU) 1

One-slide summary Node 1 Node 2 One-sided (READ) NIC Two-sided (SEND) RECV DRAM CPU Existing systems FaSST • Uses RPCs over two-sided ops Use one-sided RDMA (READs and WRITEs) for • ~2x faster than existing systems transactions • F ast, s calable, s imple 2

  In-memory distributed transactions Distributed ACID transactions can be fast in datacenters   FaRM [SOSP 15, NSDI 14] , DrTM [SOSP 15, EuroSys 15] , RSI [VLDB 16] Enablers: 1. Cheap DRAM, NVRAM : No slow components on critical path 2. Fast networks : Low communication overhead 3

Transaction environment y ‘ x ‘ x y x Node 1 Node 2 Node 3 Node N Hash table How to access remote data structures? Existing systems FaSST Method One-sided READs Two-sided RPCs Round trips ≧ 2 1 Node 1 READ READ RPC request RPC response (pointer) (value) Node 2 4

RPC v/s READs microbenchmark Experiment: Fetch 32-byte chunks with READs, or RPCs FaSST RPCs make transactions faster READs GETs/s (2 READs) RPCs FaRM [SOSP 15, Fig 2] FaSST (2x ConnectX-3 NICs) (1x Connect-IB NIC) 20 60 18.0 NIC-limited Tput/machine (M/s) Tput/machine (M/s) 49.2 15 45 40.9 10 30 9.0 24.6 CPU-limited 5 4.4 15 0 0 READs GETs/s (2 READs) RPCs READs Effective GETs/s w/ READs RPCs O(1,0) tput O(1,0) tput 5

Reasons for slow RPCS Existing systems FaSST Method One-sided READs Two-sided RPCs Round trips ≧ 2 1 Scalable transport Effect: NIC cache misses Lock-free I/O Effect: Low per-thread tput 6

One-sided RDMA does not scale READs & WRITEs must use a connected transport layer READ (Reliable Connected) One-sided Node 1 Node 2 RPC req WRITE (Reliable Connected) systems RPC resp WRITE (Reliable Connected) 60 Node 2 Req rate/node (M/s) NIC cache Thread 40 Node 3 20 Thread READs FaSST RPCs Node N Problem: Node 1 0 0 20 40 60 80 100 Cache overflow Number of nodes (N) 7

CPU overhead of connection sharing Single-thread tput w/ sharing 15 Node 2 Req rate/thread (M/s) NIC cache 10.9 Thread 10 Node 3 Thread 5 2.1 Node N Node 1 Problem: Problem: 0 Sequencer throughput No sharing Sharing Cache overflow Connection sharing Local overhead of remote bypass = 5x 8

Connectionless transport scales But it supports only two-sided (SEND/RECV) operations Req SEND (Unreliable Datagram) READs don’t use fewer CPU cycles than RPCs! FaSST Node 1 Node 2 Resp SEND (Unreliable Datagram) Local overhead offsets remote gains READs vs FaSST RPCs FaSST RPCs make transactions scalable 5 Req rate/thread (M/s) Node 2 3.6 4 NIC cache Thread 3 Node 3 2.1 2 Thread 1 Node N Node 1 0 Sequencer throughput READs FaSST RPCs (sharing) 9

FaSST RPCs make transactions Simpler Remote bypassing designs are complex • Redesign and rewrite data stores • Hash table [FaRM-KV, NSDI 14 ], B-Tree [Cell, ATC 15 ] RPC-based designs are simple • Reuse existing data stores • Hash table [MICA, NSDI 14 ], B-Tree [Masstree, EuroSys 12 ] 10

UD does not provide reliability. But the link layer does! Switch No packet loss in • 69 nodes, 46 hours Node 2 Node 1 • 100 trillion packets - No end-to-end reliability   + Link layer flow control   • 50 PB transferred + Link layer retransmission Handle packet loss similar to machine failure: See paper 11

Performance comparison vs FaRM: FaSST uses Nodes NICs Cores 50% fewer h/w resources FaRM 50 2x ConnectX-3 16 DrTM+R 6 1x ConnectX-3 10 vs DrTM+R: FaSST makes FaSST 50 1x ConnectX-3 8 no data locality assumptions TATP benchmark SmallBank benchmark (80% rdonly txns) (85% rw txns) 3.6 4 2 1.6 Tput/machine Tput/machine 3 1.9 0.9 (M/s) (M/s) 2 1 1 0 0 TAPT tput TAPT tput FaRM FaSST DrTM+R FaSST 12

  Conclusion Transactions with one-sided RDMA are:   1. Slow: Data access requires multiple round trips   2. Non-scalable: Connected transports   3. Complex: Redesign data stores   Transactions with two-sided datagram RPCs are:   1. Fast: One round trip   2. Scalable: Datagram transport + link layer reliability   3. Simple: Re-use existing data stores Code: https://github.com/efficient/fasst 13

FaSST: Fast, Scalable, and Simple Distributed Transactions with - PowerPoint PPT Presentation

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU) Michael Kaminsky (Intel Labs) David Andersen (CMU) 1 One-slide summary Node 1 Node 2 One-sided (READ) NIC Two-sided (SEND)

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

Flat and nested distributed Outline transactions Flat and nested distributed transactions

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Todays Topics - Distributed Transactions Introduction to Distributed Transactions 13.1

Scalable Distributed Lineage Authentication Ashish Gehani Scalable Distributed Lineage

Module 15: Managing Transactions and Locks Overview Introduction to Transactions and Locks

13.1 Introduction 13.2 Transactions 13.3 Nested transactions 13.4 Locks 13.5 Optimistic

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database

Fast Scalable Parallel Comparison Sort Fast, Scalable Parallel Comparison Sort On Hybrid Multicore

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

20 0 6 Transactions $1.01 billion in bonds 18 transactions 20 0 6 Transactions By Num

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Database Management Objectives of Lecture 7 Systems Transactions Models Transactions Models

Random Local Exploration Techniques for Sublinear-Time Algorithms Krzysztof Onak IBM Research

r t r r

Correlations between Parallel Patterns and Multi-core Benchmarks Vivek Kale IWMSE workshop May

Scalable performance analysis with Projections Sanjay Kale, http://charm.cs.illinois.edu Based

Storm: a fast transactional dataplane for remote data structures Stanko Novakovic Yizhou Shan

What is a Choreography? A choreography is a way to organize a multiparty web application in a

Performance Isolation Anomalies in RDMA Yiwen Zhang with Juncheng Gu, Youngmoon Lee, Mosharaf

Engineering Multiagent Systems for Ethics and Privacy-Aware Social Computing Nirav Ajmeri (Under

Sambuz

Useful Links

Newsletter

Mail Us