pam when overloaded push your neighbor aside
play

PAM: When Overloaded, Push Your Neighbor Aside! Zili Meng Jun Bi Chen - PowerPoint PPT Presentation

SIGCOMM 2018 Student Research Competition (Undergraduate Category) Also in Proceedings of SIGCOMM Posters and Demos 2018 PAM: When Overloaded, Push Your Neighbor Aside! Zili Meng Jun Bi Chen Sun Shuhe Wang Minhu Wang Hongxin Hu NFV Bright


  1. SIGCOMM 2018 Student Research Competition (Undergraduate Category) Also in Proceedings of SIGCOMM Posters and Demos 2018 PAM: When Overloaded, Push Your Neighbor Aside! Zili Meng Jun Bi Chen Sun Shuhe Wang Minhu Wang Hongxin Hu

  2. NFV — Bright Side vs. Dark Side Dedicated Dedicated Dedicated Dedicated NFV: Commodity Hardware Devices Service Chain VPN Monitor Firewall Load VM VM VM VM Balancer VirtualizationTechniques Low Cost High Latency Flexibility 200 μs ~ 1 ms × 7 Scalability …… 2

  3. Accelerating NFV with SmartNICs • NPU-based Multicore SmartNICs – Netronome, Mellanox – Offloading NFs to SmartNIC to improve performance. – Easy to develop & debug • NF Migration between SmartNIC and CPU – SmartNIC may also be overloaded. – UNO [SoCC’17 ] : Ensure consistency. 3

  4. Existing Solutions Cause Performance Degradation packets Firewall CPU Before migration. PCIe (Monitor is overloaded) Load SmartNIC Logger Monitor Balancer redundant packet Firewall Monitor CPU transmissions Naive migration. PCIe Load SmartNIC Logger Balancer 4

  5. Measurement of Transmission Latency Packet transmission time is comparable to the packet processing time. Service Chain Latency 644 μs 5

  6. Measurement of Transmission Latency Packet transmission time is comparable to the packet processing time. Service Chain Latency 644 μs 100 μs One round-trip transmission 6

  7. Can we reduce the additional transmission latency due to NF migration? 7

  8. Key Novelty – Push Aside Migration When overloaded, push your neighbor aside and occupy its resources. 8

  9. Push Aside Migration At the service chain scope… packets Firewall CPU Before migration. PCIe (Monitor is overloaded) Load SmartNIC Logger Monitor Balancer Firewall Logger CPU PCIe Push Aside Migration Load SmartNIC Logger Monitor Balancer 9

  10. Can we reduce the additional transmission latency due to NF migration? Yes, push border NFs away to make space for the overloaded NF. 10

  11. Dynamic Scaling of Service Chains ℬ vNF ℬ CPU ℬ SmartNIC Which border NF to migrate? 11

  12. Greedy-based Border vNF Selection Algorithm Goal: Minimize the Number of vNF to Migrate • Always select the border vNF with minimum capacity. Minimum capacity given fixed resource Maximum resource consumed given fixed throughput • Constraints ensured: – Overload on SmartNIC will be alleviated. – Migration should not create new hot spots on CPU. • Please refer to our paper for more details. 12

  13. Evaluation – Latency Reduction 20% latency reduction FW Original LOG LB MON FW MON Naive LOG LB FW LOG PAM MON LB 13

  14. Evaluation – Throughput Maintenance throughput maintained FW Original LOG LB MON FW MON Naive LOG LB FW LOG PAM MON LB 14

  15. Discussions • Suitability of vNF on different devices. – Some kinds of NFs may be suitable only to SmartNIC or CPU. – Potential solution (ongoing work): Introduce suitability of NFs. 15

  16. Discussions • Suitability of vNF on different devices. – Some kinds of NFs may be suitable only to SmartNIC or CPU. – Potential solution (ongoing work): Introduce suitability of NFs. • Isolation on SmartNICs. – Unlike CPU, there is no mature isolation mechanisms on SmartNIC. – Potential solution (future work): Software isolation (NetBricks [OSDI’16]) 16

  17. Discussions • Suitability of vNF on different devices. – Some kinds of NFs may be suitable only to SmartNIC or CPU. – Potential solution (ongoing work): Introduce suitability of NFs. • Isolation on SmartNICs. – Unlike CPU, there is no mature isolation mechanisms on SmartNIC. – Potential solution (future work): Software isolation (NetBricks [OSDI’16]) • Precise analysis on PCIe and SmartNIC resource. – Potential solution (future work): PCIe modelling (pcie- bench [Sigcomm’18]) 17

  18. Applying PAM to Other Scenarios • PAM aims to bring a new direction for NF scaling. • When multiple NFs share resources, by pushing other NFs away, the overload NF could automatically preempt resource for scaling. NF3’ sync VM NF1 NF2 NF3 NF4 18

  19. Applying PAM to Other Scenarios • PAM is designed for the scenario of SmartNIC-CPU cooperation. • Can it be extended to other application scenarios? – Multiple kinds of devices. GPU FPGA 19

  20. Future Thoughts on Selecting NFs to Migrate • PAM is heuristic, but which NF to migrate is a problem. • Can we improve the performance further and globally? – Inspired by scheduling problem on cluster jobs, using reinforcement learning for further performance improvement (DeepRM [HotNets’16] ). Reward r : end-to-end performance State Agent Environment Action a : Policy vNFs to migrate 𝜌 𝜄 𝑡, 𝑏 Network vNF traffic load Observations from the environment 20

  21. Future Thoughts on Selecting NFs to Migrate • PAM is heuristic, but which NF to migrate is a problem. • Can we improve the performance further and globally? – Inspired by scheduling problem on cluster jobs, using reinforcement learning for further performance improvement (DeepRM [HotNets’16] ). Reward r : end-to-end performance State Agent Environment Action a : State Policy vNFs to migrate Network vNF Embedding Network traffic load Observations from the environment 21

  22. Conclusion & Takeaway Problem: Migration between SmartNIC and CPU degrades performance. Intuition: When one NF is overloaded, we can migrate other NFs away and grab their resources to alleviate the hot spot. Question: Which NFs to migrate? Answer: Migrate NFs on the border between SmartNIC and CPU with minimum capacity. Evaluation: 18% latency benefits. 22

  23. Thank you! www.zilimeng.info mengzl15@mails.tsinghua.edu.cn

  24. Backup Slides 24

  25. Resource Analysis Throughput Capacity 𝒟 , 𝜄 𝑗 𝒯 : throughput capacity of vNF 𝑗 on 𝜄 𝑗 𝒯 𝒟 vNF 𝒋 𝜾 𝒋 𝜾 𝒋 CPU ( 𝒟 ) or SmartNIC ( 𝒯 ). Firewall 10Gbps 4Gbps Logger 2Gbps 4Gbps Monitor 3.2Gbps 10Gbps Load Balancer >10Gbps 4Gbps Payload Analyzer 5Gbps 200Mbps 𝒯 𝜄 𝑗 25

  26. Resource Analysis Assumption Resource utilization of a vNF increases linearly with its throughput: 𝒯 = 𝜄 𝑑𝑣𝑠 𝒟 = 𝜄 𝑑𝑣𝑠 𝑠 𝒯 , 𝑠 𝑗 𝑗 𝒟 𝜄 𝑗 𝜄 𝑗 26

  27. Resource Analysis Assumption Resource utilization of a vNF increases linearly with its throughput: 𝒯 = 𝜄 𝑑𝑣𝑠 𝒟 = 𝜄 𝑑𝑣𝑠 𝑠 𝒯 , 𝑠 𝑗 𝑗 𝒟 𝜄 𝑗 𝜄 𝑗 Deduction 𝐹 1 𝐹 2 The capacity 𝜄′ of the chain 𝐹 1 → 𝐹 2 : 𝒯 𝜄 2 𝒯 𝜄 ′ 𝒯 + 𝜄 ′ 𝜄 1 𝒯 = 1 ⇒ 𝜄 ′ = 𝒯 + 𝜄 2 𝒯 𝜄 1 𝜄 2 𝜄 1 For “ Payload Analyzer → Monitor ”: ′ ′ 𝜄 𝑛𝑓𝑏𝑡𝑣𝑠𝑓 = 1.8Gbps ≈ 𝜄 𝑢ℎ𝑓𝑝𝑠𝑧 = 1.9Gbps 27

  28. Border vNF Selection Algorithm Step 1: Border vNFs Identification • ℬ : border elements on SmartNIC in a service chain (graph). • Check whether a NF is placed together with its upstream/downstream NFs. ℬ vNF ℬ CPU ℬ SmartNIC 28

  29. Border vNF Selection Algorithm Step 2: Migration vNFs Selection • Select the NF with minimum capacity. – Intuition: migrating the NF with minimum capacity will alleviate overload more efficiently. 𝒯 𝑐 0 = argmin 𝜄 𝑐 𝑐∈ℬ ℬ vNF ℬ CPU ℬ SmartNIC 29

  30. Border vNF Selection Algorithm Step 3: Overload Alleviation Check • ( ℂ 1 ): Migration should not cause new hot spots on CPU. 𝜄 𝑑𝑣𝑠 𝒟 + 𝜄 𝑑𝑣𝑠 < 1 ෍ 𝒟 𝜄 𝑗 𝜄 𝑐 0 𝑗∈ 𝑂𝐺𝑡 𝑝𝑜 𝒟 • Migrate 𝑐 0 if ( ℂ 1 ) is satisfied. Otherwise go back to Step 2 . ℬ vNF ℬ CPU ℬ SmartNIC 30

  31. Border vNF Selection Algorithm Step 3: Overload Alleviation Check • ( ℂ 2 ): The overload on SmartNIC should be alleviated. 𝜄 𝑑𝑣𝑠 𝒯 < 1 ෍ 𝜄 𝑗 𝑗∈ 𝑂𝐺𝑡 𝑝𝑜 𝒯 ,𝑗≠𝑐 0 • Algorithm ends if ( ℂ 2 ) is satisfied. Otherwise go back to Step 2 . ℬ vNF ℬ CPU ℬ SmartNIC 31

  32. Border vNF Selection Algorithm Step 3: Overload Alleviation Check • ( ℂ 2 ): The overload on SmartNIC should be alleviated. 𝜄 𝑑𝑣𝑠 𝒯 < 1 ෍ 𝜄 𝑗 𝑗∈ 𝑂𝐺𝑡 𝑝𝑜 𝒯 ,𝑗≠𝑐 0 • Algorithm ends if ( ℂ 2 ) is satisfied. Otherwise go back to Step 2 . ℬ vNF ℬ CPU ℬ SmartNIC 32

  33. Policy Gradient Algorithm – REINFORCE From https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#reinforce 33

Recommend


More recommend