Jumpgate: In-Network Processing as a Service for Data Analytics Craig Mustard, Fabian Ruffy, Anny Gakhokidze, Ivan Beschastnikh, Alexandra Fedorova University of British Columbia 1
In-Network Processing Can Accelerate Data Analytics Switches (P4) 2-8x speedup [NetAccel, DAIET] Programmable Storage NIC Switch Cluster >1000x less traffic [Sonata] Original Data Path Smart NICs (FPGAs) 96% increased Programmable Compute throughput [Floem] Switch NIC Cluster 2
There are many places to do In-Network Processing NPUs ASIC/FPGAs Programmable Storage NIC Switch Ephemeral VMs Cluster Off-path Opportunities Original Alternative Data Path Data Path NPUs ASIC/FPGAs Programmable Compute Ephemeral VMs Switch NIC Cluster Off-path Opportunities 3
There are many places to do In-Network Processing NPUs ASIC/FPGAs Programmable Storage NIC Switch Ephemeral VMs Cluster Off-path Opportunities Alternative Data Path NPUs Software ASIC/FPGAs Middleboxes Programmable Compute Ephemeral VMs Switch NIC 4.5x speedup Cluster Off-path [NetAgg] Opportunities 4
There are many places to do In-Network Processing NPUs ASIC/FPGAs Programmable Storage NIC Switch Ephemeral VMs Cluster 2-16x speedup on Apache Spark Off-path Opportunities when performing filter, project, shuffle, Original Alternative Data Path aggregation somewhere in the network. Data Path NPUs Software ASIC/FPGAs Middleboxes Programmable Compute Ephemeral VMs Switch NIC 4.5x speedup Cluster Off-path [NetAgg] Opportunities 5
Challenges to actually using NPs Tough to program: ➔ Target Devices Diverse hardware ◆ Switches Requires high performance software ◆ Smart NICs Packet-oriented NOT flow-oriented ◆ Storage limits (e.g., very little cross-packet state) ◆ Ephemeral VMs Manage multiple devices at the same time ➔ N(etwork) PUs Specialized devices not good at all parts of a query ◆ Integration with storage and analytics systems ➔ FPGAs Need suitable protocols and data formats for NPs to ◆ D(ata) PUs operate on data See our paper or come talk to me for details! Storage System 6
How should we incorporate solutions into systems? Target Devices Switches Smart NICs Ephemeral VMs N(etwork) PUs FPGAs D(ata) PUs Storage System 7
How should we incorporate? One (bad) option: Target Devices Switches Smart NICs Ephemeral VMs N(etwork) PUs FPGAs D(ata) PUs Storage System 8
How should we incorporate? One (bad) option: Target Devices Problems: Switches Smart NICs Not scalable to all ➔ Ephemeral VMs analytics systems N(etwork) PUs Not future-proof to ➔ FPGAs new devices D(ata) PUs Hard to share code ➔ Storage System 9
Our proposal: Network Processing as a Service Target Devices Switches Smart NICs Network Ephemeral VMs Processing N(etwork) PUs as a Service FPGAs (NPaaS) D(ata) PUs Storage System 10
Our proposal: Network Processing as a Service Target Devices Advantages: Switches Smart NICs Network Abstracts devices ➔ Ephemeral VMs and management Processing ➔ Existing systems N(etwork) PUs as a need to change once Service New devices and FPGAs ➔ (NPaaS) systems can be D(ata) PUs added easily Storage System 11
Jumpgate: a prototype NPaaS, addressing three problems Abstraction Programmability Management 3 1 2 Client API Compiler Orchestrator read proj. data Maps logical to Deploys NP Physical group physical ops. pipelines filter Plan by Available Physical Operators Available Devices Deployment Constraints Filter + Project in Storage Virtual Machines Shuffle in Switch Switches NICs NPUs Partial Agg in SW 12
Jumpgate: example deployment NPUs ASIC/FPGAs Programmable Storage SQL NIC Switch Ephemeral VMs Cluster Client API Filter + Project in Storage read Original data Jumpgate Data Path Data Path proj. NPUs Shuffle in Switch ASIC/FPGAs Programmable filter Compute Ephemeral VMs Switch NIC Cluster Partial Agg in SW group by 13
Open Questions: We plan to use Jumpgate to investigate these questions and more. What are the right protocols and formats to use for different NPs? ➔ Protocols and formats are dependent on NP restrictions ◆ What are the best devices, and what is the best offload strategy? ➔ How to adapt existing query optimizations? ◆ How should we allocate devices w.r.t network topology? ➔ How much do we need to know about the topology to compute a good plan? ◆ Failure handling ➔ How should NPaaS interact with the client application on failures? ◆ Propagate to the client, or automatic recovery? ◆ 14
Target Devices Switches Takeaways: Smart NICs Network Ephemeral VMs Processing N(etwork) PUs as a FPGAs Service (NPaaS) D(ata) PUs Storage System In-network processors can be on-demand accelerators for data analytics tasks. ➔ But, large challenges remain to using them. ➔ Instead of building solutions into every analytics framework, we need NPaaS to ➔ provide abstractions for using NPs. Jumpgate is our NPaaS prototype to address API, compilation, and orchestration ➔ challenges, and to enable future research in this area. Thanks for listening! Happy to talk more! Questions? 15
Recommend
More recommend