FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , Hongqiang Liu 3 , Yibo Zhu 4 , Jitu Padhye 2 , Shachar Raindel 2 Chuanxiong Guo 4 , Vyas Sekar 1 , Srinivasan Seshan 1 Carnegie Mellon University 1 , Microsoft 2 , Alibaba group 3 , Bytedance 4
Two Trends in Cloud Applications Containerization RDMA networking • Lightweight isolation • Higher networking performance • Portability 1
Benefits of Containerization Host 1 Host 2 Container 1 Container 2 Container 2 IP: 20.0.0.1 IP: 10.0.0.1 IP: 20.0.0.1 Migration Network Network Network App App App Namespace Isolation Portability Software Software Switch Switch IP: 30.0.0.1 IP: 40.0.0.1 NIC NIC 2
Containerization and RDMA are in Conflict! Host 1 Host 2 Container 1 Container 2 Container 2 IP: 10.0.0.1 IP: 10.0.0.1 IP: 20.0.0.1 Migration RDMA RDMA RDMA App App App Namespace Isolation Portability IP: 10.0.0.1 IP: 20.0.0.1 RDMA NIC RDMA NIC 3
Existing H/W based Virtualization Isn’t Working Using Single Root I/O Virtualization (SR-IOV) Host 1 Host 2 Container 1 Container 2 Container 2 IP: 10.0.0.2 IP: 10.0.0.1 IP: 20.0.0.1 Migration RDMA RDMA RDMA App App App Namespace Isolation Portability IP: 10.0.0.1 IP: 10.0.0.2 IP: 20.0.0.1 VF 1 VF 2 VF RDMA NIC NIC Switch NIC Switch Virtual Function VF 4
Sub-optimal Performance of Containerized Apps RDMA networking can improve the training speed of NN model by ~ 10x ! 3000 Native RDMA Native RDMA Container+TCP Container+TCP 1 Training Speed (Images/sec) 2000 14.4x 9.2x CDF 0.5 1000 0 0 0 10 20 30 40 Resnet-50 Inception-v3 Alexnet Time per step (sec) Model Speech recognition RNN training Image classification CNN training 5
Our Work: FreeFlow • Enable high speed RDMA networking capabilities for containerized applications • Compatible with existing RDMA applications • Close to native RDMA performance • Evaluation with real-world data-intensive applications 6
Outline • Motivation • FreeFlow Design • Implementation and Evaluation 7
FreeFlow Design Overview FreeFlow Native RDMA Host Host Container 1 Container 2 RDMA App IP: 20.0.0.1 IP: 10.0.0.1 RDMA App RDMA App Verbs API Verbs API Verbs API Verbs library FreeFlow NIC command Verbs library IP: 30.0.0.1 RDMA NIC RDMA NIC 8
Background on RDMA “Host 1 wants to write contents in MEM -1 to MEM- 2 on Host 2” Host 1 Host 2 1. Control path RDMA App RDMA App - Setup RDMA Context RDMA CTX MEM-1 MEM-2 RDMA CTX - Post work requests (e.g., write) 2. Data path - NIC processes work requests Verbs library Verbs library - NIC directly accesses memory RDMA NIC RDMA NIC 9
FreeFlow in the Scene “Container 1 wants to write contents in MEM -1 to MEM- 2 on Container 2” Container 2 Container 1 RDMA App RDMA App RDMA CTX MEM-1 MEM-2 RDMA CTX C1: How to forward verbs calls? FreeFlow FreeFlow S-RDMA CTX S-MEM-1 S-MEM-2 S-RDMA CTX C2: How to synchronize memory? Verbs library Verbs library RDMA NIC RDMA NIC 10
Challenge 1: Verbs forwarding in Control Path struct ibv_qp { Container struct ibv_context *context; RDMA App RDMA App …. }; ? ibv_post_send (struct ibv_qp* qp , …) FreeFlow Shim Verbs API Attempt 1: Forward “as it is” ➔ Incorrect Verbs library Attempt 2: “Serialize” and forward NIC command ➔ Inefficient RDMA NIC 11
Internal Structure of Verbs Library struct ibv_qp { Container struct ibv_context *context; RDMA App RDMA App …. }; ? ibv_post_send (struct ibv_qp* qp , …) FreeFlow Shim Verbs API Verbs library NIC command Parameters are serialized by Verbs library! RDMA NIC 12
FreeFlow Control Path Channel Idea: Leveraging the serialized output of verbs library Container RDMA App RDMA App ibv_post_send (struct ibv_qp* qp , ….) VNIC Verbs library FreeFlow library FreeFlow Router Shim Verbs API Write (VNIC_fd, serialized parameters) Verbs library Parameters are forwarded correctly VNIC NIC command without manual serialization! FreeFlow Router RDMA NIC 13
Challenge 2: Synchronizing Memory for Data Path Container RDMA App • Shadow memory in FreeFlow router RDMA CTX MEM • A copy of application’s memory region • Directly accessed by NICs VNIC FreeFlow Router • S-MEM and MEM must be synchronized. S-RDMA CTX S-MEM • How to synchronize S-MEM and MEM? Verbs library RDMA NIC 14
Strawman Approach for Synchronization “Container 1 wants to write contents in MEM -1 to MEM- 2 on Container 2” Container Container RDMA App RDMA App DATA RDMA CTX MEM-1 MEM-2 RDMA CTX Explicit synchronization ? VNIC VNIC High freq. ➔ High overhead Low freq. ➔ Wrong data for app FreeFlow Router FreeFlow Router S-RDMA CTX S-MEM-1 S-MEM-2 S-RDMA CTX Verbs library Verbs library RDMA NIC RDMA NIC 15
Containers can Share Memory Regions Host Container RDMA App RDMA CTX MEM-1 Shared memory VNIC • FreeFlow router is running in a container MEM FreeFlow Router S-RDMA CTX S-MEM-1 MEM and S-MEM can be located on the same physical memory region Verbs library RDMA NIC 16
Zero-copy Synchronization in Data Path Host Container How to allocated MEM-1 to shadow memory space? RDMA App RDMA CTX MEM-1 Shared memory VNIC MEM FreeFlow Router S-RDMA CTX S-MEM-1 Synchronization without explicit memory copy: Method1: Allocate shared buffers with FreeFlow APIs Method2: Re-map app’s memory space to shadow Verbs library memory space FreeFlow supports both! RDMA NIC 17
FreeFlow Design Summary Container 1 Container 2 IP: 10.0.0.1 IP: 20.0.0.1 RDMA App RDMA App FreeFlow control path channel VNIC VNIC FreeFlow Router Zero-copy memory synchronization Verbs library IP: 30.0.0.1 RDMA NIC FreeFlow provides near native RDMA performance for containers! 18
Outline • Motivation • FreeFlow Design • Implementation and Evaluation 19
Implementation and Experimental Setup • FreeFlow Library • Add 4000 lines in C to libibverbs and libmlx4. • FreeFlow Router • 2000 lines in C++ • Testbed setup • Two Intel Xeon E5-2620 8-core CPUs, 64 GB RAM • 56 Gbps Mellanox ConnectX-3 NICs • Docker containers 20
Does FreeFlow Support Low Latency? 4 Native RDMA FreeFlow Latency (us) 3 0.38 μ s 2 1 0 64 256 1K 4K Message size (B) 21
Does FreeFlow Support High Throughput? 60 Throughput (Gbps) 40 Bounded by control path channel performance 20 Native RDMA FreeFlow 0 2K 8K 32K 128K 512K 1M Message size (B) 22
Do Applications Benefit from FreeFlow? Container+TCP Native RDMA FreeFlow 1 8.7x CDF 0.5 0 0 10 20 30 40 Time per step (sec) 23
Summary • Containerization today can’t benefit from speed of RDMA. • Existing solutions for NIC virtualization don’t work (e.g., SR -IOV). • FreeFlow enables containerized apps to use RDMA. • Challenges and Key Ideas • Control path: Leveraging Verbs library structure for efficient Verbs forwarding • Data path: Zero-copy memory synchronization • Performance close to native RDMA github.com/microsoft/freeflow 24
Recommend
More recommend