Zsolt István * , Gustavo Alonso, Ankit Singla Systems Group, Computer Science Dept., ETH Zürich * Now at IMDEA Software Institute, Madrid Providing Multi-tenant Services with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store
FPGAs in the Cloud • Wider adoption of FPGAs (e.g., Amazon F1, Microsoft Catapult, …) • Many promising use-cases but often singe-tenant designs • Clouds built on sharing and multi-tenancy ❑ High utilization ❑ Flexible provisioning ❑ Load isolation and QoS guarantees 2
Providing multi-tenancy with FPGAs FPGA FPGA Virtualization Multi-tenant applications • General purpose (PR) • Domain-specific • Few tenants • Many tenants • Trades off functionality • Trades off performance (?) • Course grained resource alloc. • Fine grained resource alloc. • Tenants “bring” applications • Provider “brings” application 3
Multi-tenant application as a service Key-value store • Widely deployed in the cloud and datacenters • Different tradeoffs but similar interfaces, e.g.: • Memcached – caching, no replication, latency- optimized, main-memory • Amazon S3 – BLOB store, replicated, BW-optimized, needs large capacity 4
Building a multi-tenant KVS (Multes) • Area well studied in related work • Several pipelined designs, all saturate network link • Caribou: Interfaces and functionality similar to SW [VLDB17] • FPGA can provide replication for fault-tolerance [NSDI16] • Requirements for multi-tenancy: • Performance isolation • Data isolation • Flexibility in resource allocation (focus on network bandwidth) • Efficient use of resources regardless of number of tenants [VLDB17] Z. István, D. Sidler, G. Alonso Caribou: Intelligent Distributed Storage. 5 [NSDI16] Z. István, D. Sidler, G. Alonso, M. Vukolic: Consensus in a Box: Inexpensive Coordination in Hardware.
Designing for multi-tenancy messages Replication • Caribou is composed of four modules Multes (single pipeline) Caribou • Requests can take various routes Traffic Shaper • Some traffic is inter-node Network Network Replication + Replication Stack (TCP) Stack (TCP) Log Manager • Hard to reason about load interactions! messages Client Traffic Shaper Value Multivers. Value Access Hash Table + • Multes: Reorganized pipeline to ensure Access + Hash Table + Processing Allocator Processing + Allocator all requests take same path (1) • Hash table implements parts of the replication log features (multi-version) Memory Memory • More coupling between modules (op- codes) 6
Per-tenant limits (D,C,T) Token buckets Round Traffic Shaper Configuration -robin Token Output packets/commands Input packets/commands • Commonly used in networking scenarios Bucket • Max. number of tokens ( D ), adding C tokens tenant ID Extract every T cycles Token Bucket • Limits data rate, burst size Token • Buffer space on the FPGA? Bucket • Queue commands before data movement • Token buckets can be configured with no Encodes the Meta- “real cost” of Body data overhead at runtime (2) the request • Per-tenant allocations controlled by software Request/command 7
Replicated KVS • Caribou implements inter-FPGA replication (leader based algorithm) Tenant 1 replicated group Tenant 2 Leader Replica Replica FPGA FPGA FPGA FPGA node node node node Replica Replica Leader Tenant 3 replicated group 8
List of peers, Role in protocol, Outstanding Having multiple roles proposals, etc. Tenant Tenant Tenant • Control state machine at heart of 1 State 2 State 3 State replication protocol • Data and control handled separately • Multiple copies not an option Out. command Input message • Complex logic + plumbing • SM extended to store state for each tenant – can context switch per each packet (3) Replication controller • Not all states need tenant context (atomic broadcast • Latency inside SM not on critical path Encodes protocol) key, data • Now in registers, but could use BRAMs to op., socket store state numbers, etc. 9
Replication protocol Evaluation Client Multes of Multes Network Client Client Memory/ Client Storage Client Tenant 1 Client Tenant 2 • Multiple Xilinx VC709s connected to a 10Gbps switch • 9 load generating machines, Go-based benchmarking tool • Tenants connect to different TCP port numbers (e.g. 2880, 2881, …) ✓ Multes offers flexible multi-tenancy while efficiently using the network link 10
No performance loss due to multi-tenancy • Read-only throughput on a single node 11
Load isolation • Replicated write latency of Tenant 0 (group = 3) • Additional tenants using their full read bandwidth (1/8 of 10Gbps) Replicated write latency [us] (without client overhead) 12
Resource Usage: Small cost for sharing 100 Logic 2x Caribou 90 % of VC709 resources 80 70 BRAM 60 50 Logic 40 BRAM 30 20 Multes T=2 10 Caribou 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 No. of max. supported tenants in Multes The FPGA part on the VC709 is XC7VX690T-2FFG1761C 13
Thoughts on the future Platform-as-a-service Multes • Customize KVS with tenant-defined Traffic Shaper Network Replication processing for different “flavors” Stack (TCP) Traffic Shaper Value Multivers. • Combining multi-tenant application with Access + Hash Table small PR regions Processing + Allocator • Simple streaming interfaces – can use HLS, OpenCL, etc. Memory • Misbehaving PR region does not impact others 14
Conclusion Multes: multi- tenant KVS service that doesn’t sacrifice performance Project on Github: https://github.com/fpgasystems/caribou Relied on three techniques: 1) Single-pipeline architecture and traffic shapers → no load interaction 2) Runtime-parameterization of control modules → flexible allocations 3) “Contexts” in controlling state machines → no overhead when switching between tenants → Applicable to many network-facing applications on FPGAs 15
Recommend
More recommend