in network computing to the rescue of faulty links
play

In-Network Computing to the rescue of Faulty Links - PowerPoint PPT Presentation

In-Network Computing to the rescue of Faulty Links Acknowledgements: Isaac Pedisich (UPenn), Gordon Brebner (Xilinx), DARPA Contracts No. HR0011-17-C-0047 and HR0011-16-C-0056, and NSF grant CNS-1513679. Path Node 1 Node 2 2 Path Node


  1. In-Network Computing to the rescue of Faulty Links Acknowledgements: Isaac Pedisich (UPenn), Gordon Brebner (Xilinx), DARPA Contracts No. HR0011-17-C-0047 and HR0011-16-C-0056, and NSF grant CNS-1513679. 


  2. Path Node 1 Node 2 � 2

  3. Path Node 1 Node 2 Packet loss -> Application malfunction � 3

  4. Path Node 1 Node 2 Congestion Unstable, various mitigations � 4

  5. Path Node 1 Node 2 Congestion Corruption Stable, rerouting mitigation � 5 + replacement

  6. Link Node 1 Node 2 Congestion Corruption � 6

  7. Tra ffi c Engineering Path Node 1 Node 2 Congestion Corruption � 7

  8. Loss and TCP th'put Loss Rate 10 − 1 10 − 3 10 − 5 10 − 7 10 − 2 10 − 4 10 − 6 0 TCP Throughput (Gb/s) 8 6 4 2 0 0 50 100 150 200 250 300 Time (seconds) Loss disproportionate to corruption � 8

  9. Current solution Disable Link(s) Node 1 Node 2 Congestion Corruption � 9

  10. This talk Forward Error-Correction Node 1 Node 2 Congestion Corruption � 10

  11. This talk FEC Node 1 Node 2 Congestion Corruption � 11

  12. This talk In-network solution 
 relying on computing FEC Node 1 Node 2 � 12

  13. This talk In-network solution 
 relying on computing FEC Node 1 Node 2 Build on recent advances in programmable 
 datacenter 
 networks. � 13

  14. Design goals • Transparent to rest of the network • Low overhead (beyond the FEC overhead) • Low complexity activation : between adjacent elements about whether to activate FEC • Support for di ff erent tra ffi c classes (a ff ecting latency and redundancy ) � 14

  15. Where to decide? Central Distributed (Single element decides 
 (Each element sees to its own links. for other elements’ links) Faster reaction time) � 15

  16. What to do? Repeat Redund (Resend information) (In the hope more info gets through) � 16

  17. What layer FEC? Network (End-to-end overhead) Link (Overhead on faulty links) Physical (Change Ethernet) � 17

  18. What layer FEC? Network (End-to-end overhead) Link (Overhead on faulty links) Physical � 18

  19. What layer FEC? Network Link (Overhead on faulty links) Physical � 19

  20. LL FEC Encoding Switch Decoding Switch Faulty Link Client Server (But design could work on non-switch-to-switch links) � 20

  21. Data � 21

  22. Data � 22

  23. Data Parity � 23

  24. Data Parity � 24

  25. h k (Block) � 25

  26. � 26

  27. � 27

  28. Stats Stats (see paper) � 28

  29. Tagging � 29

  30. � 30

  31. Parity frames � 31

  32. 1 block = k data frames + h parity frames � 32

  33. Traffic classification: protocol+port 
 (Configured by network controller)

  34. Implementation • High-level logic in P4 (e.g., tra ffi c classification) • Two toolchains: Xilinx’s SDNet and P4’s p4c-BMv2 • External logic in C, targeting both FPGA board (Xilinx ZCU102) and CPU (x86) • Work-in-progress: stats gathering, hardware decoding. � 35

  35. Post- processor Data Words In Packet Stream Out Packet Stream In Packet Out Packet In FEC Implementation Data Words Out FEC UserEngine FEC External Function Port Out Packet Port In Packet processor Pre- P4 PX C

  36. Evaluation • Unmodified host stacks and applications. • Raw throughput. 
 DPDK vs FPGA/CPU implementation of Encoder 
 FPGA: 9.3Gbps 
 CPU: 1.4Gbps (8 physical cores) • Goodput vs Error-rate 
 iperf vs model. � 38

  37. Evaluation 10 8 Throughput (Gb/s) 6 4 No FEC (25, 10) 2 (25, 1) (10, 5) (25, 5) (5, 5) 0 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 Error Rate (Percent of packets Lost) � 39

  38. Evaluation 10 8 Throughput (Gb/s) 6 4 No FEC (25, 10) 2 (25, 1) (10, 5) (25, 5) (5, 5) 0 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 Error Rate (Percent of packets Lost) � 40

  39. Evaluation 10 8 Throughput (Gb/s) 6 4 No FEC (25, 10) 2 (25, 1) (10, 5) (25, 5) (5, 5) 0 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 Error Rate (Percent of packets Lost) � 41

  40. 10 3 Congestion Window Size (KB) 10 2 No FEC (25, 10) (25, 1) (10, 5) 10 1 (25, 5) (5, 5) 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 Error Rate (Percent of packets Lost)

  41. Conclusions • Design for in-network lossy-link mitigation 
 Components: FEC + management logic • Goals: network transparency, quick reaction, configurable classes, low non-FEC overhead. • Compatible with existing/centralised approaches, to alert technicians/SREs. • Ongoing work: completing implementation, 
 integrating new “externs” on heterogeneous host/network � 43

  42. In-Network Computing to the rescue of Faulty Links Acknowledgements: Isaac Pedisich (UPenn), Gordon Brebner (Xilinx), DARPA Contracts No. HR0011-17-C-0047 and HR0011-16-C-0056, and NSF grant CNS-1513679. 


Recommend


More recommend