networking for containerized clouds
play

Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , - PowerPoint PPT Presentation

FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , Hongqiang Liu 3 , Yibo Zhu 4 , Jitu Padhye 2 , Shachar Raindel 2 Chuanxiong Guo 4 , Vyas Sekar 1 , Srinivasan Seshan 1 Carnegie Mellon


  1. FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , Hongqiang Liu 3 , Yibo Zhu 4 , Jitu Padhye 2 , Shachar Raindel 2 Chuanxiong Guo 4 , Vyas Sekar 1 , Srinivasan Seshan 1 Carnegie Mellon University 1 , Microsoft 2 , Alibaba group 3 , Bytedance 4

  2. Two Trends in Cloud Applications Containerization RDMA networking • Lightweight isolation • Higher networking performance • Portability 1

  3. Benefits of Containerization Host 1 Host 2 Container 1 Container 2 Container 2 IP: 20.0.0.1 IP: 10.0.0.1 IP: 20.0.0.1 Migration Network Network Network App App App Namespace Isolation Portability Software Software Switch Switch IP: 30.0.0.1 IP: 40.0.0.1 NIC NIC 2

  4. Containerization and RDMA are in Conflict! Host 1 Host 2 Container 1 Container 2 Container 2 IP: 10.0.0.1 IP: 10.0.0.1 IP: 20.0.0.1 Migration RDMA RDMA RDMA App App App Namespace Isolation Portability IP: 10.0.0.1 IP: 20.0.0.1 RDMA NIC RDMA NIC 3

  5. Existing H/W based Virtualization Isn’t Working Using Single Root I/O Virtualization (SR-IOV) Host 1 Host 2 Container 1 Container 2 Container 2 IP: 10.0.0.2 IP: 10.0.0.1 IP: 20.0.0.1 Migration RDMA RDMA RDMA App App App Namespace Isolation Portability IP: 10.0.0.1 IP: 10.0.0.2 IP: 20.0.0.1 VF 1 VF 2 VF RDMA NIC NIC Switch NIC Switch Virtual Function VF 4

  6. Sub-optimal Performance of Containerized Apps RDMA networking can improve the training speed of NN model by ~ 10x ! 3000 Native RDMA Native RDMA Container+TCP Container+TCP 1 Training Speed (Images/sec) 2000 14.4x 9.2x CDF 0.5 1000 0 0 0 10 20 30 40 Resnet-50 Inception-v3 Alexnet Time per step (sec) Model Speech recognition RNN training Image classification CNN training 5

  7. Our Work: FreeFlow • Enable high speed RDMA networking capabilities for containerized applications • Compatible with existing RDMA applications • Close to native RDMA performance • Evaluation with real-world data-intensive applications 6

  8. Outline • Motivation • FreeFlow Design • Implementation and Evaluation 7

  9. FreeFlow Design Overview FreeFlow Native RDMA Host Host Container 1 Container 2 RDMA App IP: 20.0.0.1 IP: 10.0.0.1 RDMA App RDMA App Verbs API Verbs API Verbs API Verbs library FreeFlow NIC command Verbs library IP: 30.0.0.1 RDMA NIC RDMA NIC 8

  10. Background on RDMA “Host 1 wants to write contents in MEM -1 to MEM- 2 on Host 2” Host 1 Host 2 1. Control path RDMA App RDMA App - Setup RDMA Context RDMA CTX MEM-1 MEM-2 RDMA CTX - Post work requests (e.g., write) 2. Data path - NIC processes work requests Verbs library Verbs library - NIC directly accesses memory RDMA NIC RDMA NIC 9

  11. FreeFlow in the Scene “Container 1 wants to write contents in MEM -1 to MEM- 2 on Container 2” Container 2 Container 1 RDMA App RDMA App RDMA CTX MEM-1 MEM-2 RDMA CTX C1: How to forward verbs calls? FreeFlow FreeFlow S-RDMA CTX S-MEM-1 S-MEM-2 S-RDMA CTX C2: How to synchronize memory? Verbs library Verbs library RDMA NIC RDMA NIC 10

  12. Challenge 1: Verbs forwarding in Control Path struct ibv_qp { Container struct ibv_context *context; RDMA App RDMA App …. }; ? ibv_post_send (struct ibv_qp* qp , …) FreeFlow Shim Verbs API Attempt 1: Forward “as it is” ➔ Incorrect Verbs library Attempt 2: “Serialize” and forward NIC command ➔ Inefficient RDMA NIC 11

  13. Internal Structure of Verbs Library struct ibv_qp { Container struct ibv_context *context; RDMA App RDMA App …. }; ? ibv_post_send (struct ibv_qp* qp , …) FreeFlow Shim Verbs API Verbs library NIC command Parameters are serialized by Verbs library! RDMA NIC 12

  14. FreeFlow Control Path Channel Idea: Leveraging the serialized output of verbs library Container RDMA App RDMA App ibv_post_send (struct ibv_qp* qp , ….) VNIC Verbs library FreeFlow library FreeFlow Router Shim Verbs API Write (VNIC_fd, serialized parameters) Verbs library Parameters are forwarded correctly VNIC NIC command without manual serialization! FreeFlow Router RDMA NIC 13

  15. Challenge 2: Synchronizing Memory for Data Path Container RDMA App • Shadow memory in FreeFlow router RDMA CTX MEM • A copy of application’s memory region • Directly accessed by NICs VNIC FreeFlow Router • S-MEM and MEM must be synchronized. S-RDMA CTX S-MEM • How to synchronize S-MEM and MEM? Verbs library RDMA NIC 14

  16. Strawman Approach for Synchronization “Container 1 wants to write contents in MEM -1 to MEM- 2 on Container 2” Container Container RDMA App RDMA App DATA RDMA CTX MEM-1 MEM-2 RDMA CTX Explicit synchronization ? VNIC VNIC High freq. ➔ High overhead Low freq. ➔ Wrong data for app FreeFlow Router FreeFlow Router S-RDMA CTX S-MEM-1 S-MEM-2 S-RDMA CTX Verbs library Verbs library RDMA NIC RDMA NIC 15

  17. Containers can Share Memory Regions Host Container RDMA App RDMA CTX MEM-1 Shared memory VNIC • FreeFlow router is running in a container MEM FreeFlow Router S-RDMA CTX S-MEM-1 MEM and S-MEM can be located on the same physical memory region Verbs library RDMA NIC 16

  18. Zero-copy Synchronization in Data Path Host Container How to allocated MEM-1 to shadow memory space? RDMA App RDMA CTX MEM-1 Shared memory VNIC MEM FreeFlow Router S-RDMA CTX S-MEM-1 Synchronization without explicit memory copy: Method1: Allocate shared buffers with FreeFlow APIs Method2: Re-map app’s memory space to shadow Verbs library memory space FreeFlow supports both! RDMA NIC 17

  19. FreeFlow Design Summary Container 1 Container 2 IP: 10.0.0.1 IP: 20.0.0.1 RDMA App RDMA App FreeFlow control path channel VNIC VNIC FreeFlow Router Zero-copy memory synchronization Verbs library IP: 30.0.0.1 RDMA NIC FreeFlow provides near native RDMA performance for containers! 18

  20. Outline • Motivation • FreeFlow Design • Implementation and Evaluation 19

  21. Implementation and Experimental Setup • FreeFlow Library • Add 4000 lines in C to libibverbs and libmlx4. • FreeFlow Router • 2000 lines in C++ • Testbed setup • Two Intel Xeon E5-2620 8-core CPUs, 64 GB RAM • 56 Gbps Mellanox ConnectX-3 NICs • Docker containers 20

  22. Does FreeFlow Support Low Latency? 4 Native RDMA FreeFlow Latency (us) 3 0.38 μ s 2 1 0 64 256 1K 4K Message size (B) 21

  23. Does FreeFlow Support High Throughput? 60 Throughput (Gbps) 40 Bounded by control path channel performance 20 Native RDMA FreeFlow 0 2K 8K 32K 128K 512K 1M Message size (B) 22

  24. Do Applications Benefit from FreeFlow? Container+TCP Native RDMA FreeFlow 1 8.7x CDF 0.5 0 0 10 20 30 40 Time per step (sec) 23

  25. Summary • Containerization today can’t benefit from speed of RDMA. • Existing solutions for NIC virtualization don’t work (e.g., SR -IOV). • FreeFlow enables containerized apps to use RDMA. • Challenges and Key Ideas • Control path: Leveraging Verbs library structure for efficient Verbs forwarding • Data path: Zero-copy memory synchronization • Performance close to native RDMA github.com/microsoft/freeflow 24

Recommend


More recommend