routebricks
play

RouteBricks: Exploiting Parallelism To Scale Software Routers - PowerPoint PPT Presentation

RouteBricks: Exploiting Parallelism To Scale Software Routers Mihai Dobrescu & Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia Ratnasamy EPFL, Intel Labs Berkeley,


  1. RouteBricks: Exploiting Parallelism To Scale Software Routers Mihai Dobrescu & Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia Ratnasamy EPFL, Intel Labs Berkeley, Lancaster University

  2. Building routers  Fast  Programmable » custom statistics » filtering » packet transformation » … Katerina Argyraki, SOSP, Oct. 12, 2009 2

  3. Why programmable routers  New ISP services » intrusion detection, application acceleration  Simpler network monitoring » measure link latency, track down traffic  New protocols » IP traceback, Trajectory Sampling, … Enable flexible, extensible networks Katerina Argyraki, SOSP, Oct. 12, 2009 3

  4. Today: fast or programmable  Fast “hardware” routers » throughput : Tbps » no programmability  Programmable “software” routers » processing by general-purpose CPUs » throughput < 10Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 4

  5. RouteBricks  A router out of off-the-shelf PCs » familiar programming environment » large-volume manufacturing  Can we build a Tbps router out of PCs? Katerina Argyraki, SOSP, Oct. 12, 2009 5

  6. Router = R R R R packet processing N + R R switching R R  N : number of external router ports  R : external line rate Katerina Argyraki, SOSP, Oct. 12, 2009 6

  7. A hardware router R R N linecards linecards  Processing at rate ~ R per linecard Katerina Argyraki, SOSP, Oct. 12, 2009 7

  8. A hardware router R R N switch fabric linecards linecards  Processing at rate ~ R per linecard  Switching at rate N x R by switch fabric Katerina Argyraki, SOSP, Oct. 12, 2009 8

  9. RouteBricks R R commodity N interconnect servers servers  Processing at rate ~ R per server  Switching at rate ~ R per server Katerina Argyraki, SOSP, Oct. 12, 2009 9

  10. RouteBricks R R commodity N interconnect servers servers Per-server processing rate: c x R Katerina Argyraki, SOSP, Oct. 12, 2009 10

  11. Outline  Interconnect  Server optimizations  Performance  Conclusions Katerina Argyraki, SOSP, Oct. 12, 2009 11

  12. Outline  Interconnect  Server optimizations  Performance  Conclusions Katerina Argyraki, SOSP, Oct. 12, 2009 12

  13. Requirements R R commodity N interconnect  Internal link rates < R  Per-server processing rate: c x R  Per-server fanout: constant Katerina Argyraki, SOSP, Oct. 12, 2009 13

  14. A naive solution R R R N Katerina Argyraki, SOSP, Oct. 12, 2009 14

  15. A naive solution R R R N  N external links of capacity R  N 2 internal links of capacity R Katerina Argyraki, SOSP, Oct. 12, 2009 15

  16. Valiant load balancing R/N R R/N R N Katerina Argyraki, SOSP, Oct. 12, 2009 16

  17. Valiant load balancing R/N R/N R R N  N external links of capacity R  N 2 internal links of capacity R 2 R / N Katerina Argyraki, SOSP, Oct. 12, 2009 17

  18. Valiant load balancing R/N R/N R R N  Per-server processing rate: 3 R  Uniform traffic: 2 R Katerina Argyraki, SOSP, Oct. 12, 2009 18

  19. Per-server fanout? R N Katerina Argyraki, SOSP, Oct. 12, 2009 19

  20. Per-server fanout? R N  Increase server capacity Katerina Argyraki, SOSP, Oct. 12, 2009 20

  21. Per-server fanout? R N  Increase server capacity Katerina Argyraki, SOSP, Oct. 12, 2009 21

  22. Per-server fanout? R N  Increase server capacity  Add intermediate nodes » k -degree n -stage butterfly Katerina Argyraki, SOSP, Oct. 12, 2009 22

  23. Our solution: combination  Assign max external ports per server  Full mesh, if possible  Extra servers, otherwise Katerina Argyraki, SOSP, Oct. 12, 2009 23

  24. Example  Assuming current servers » 5 NICs, 2 x 10G ports or 8 x 1G ports » 1 external port per server  N = 32 ports: full mesh » 32 servers  N = 1024 ports: 16-ary 4-fly » 2 extra servers per port Katerina Argyraki, SOSP, Oct. 12, 2009 24

  25. Recap R R Valiant load balancing N + full mesh k-ary n-fly Per-server processing rate: 2 R – 3 R Katerina Argyraki, SOSP, Oct. 12, 2009 25

  26. Outline  Interconnect  Server optimizations  Performance  Conclusions Katerina Argyraki, SOSP, Oct. 12, 2009 26

  27. Setup: NUMA architecture Mem min-size packets I/O hub Mem Ports Cores » Nehalem architecture, QuickPath interconnect » CPUs: 2 x [2.8GHz, 4 cores, 8MB L3 cache] » NICs: 2 x Intel XFSR 2x10Gbps » kernel-mode Click Katerina Argyraki, SOSP, Oct. 12, 2009 27

  28. Single-server performance Mem I/O hub Mem Ports Cores  First try: 1.3 Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 28

  29. Problem #1: book-keeping  Managing packet descriptors » moving between NIC and memory » updating descriptor rings  Solution: batch packet operations » NIC batches multiple packet descriptors » CPU polls for multiple packets Katerina Argyraki, SOSP, Oct. 12, 2009 29

  30. Single-server performance Mem I/O hub Mem Ports Cores  First try: 1.3 Gbps  With batching: 3 Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 30

  31. Problem #2: queue access Ports Cores Katerina Argyraki, SOSP, Oct. 12, 2009 31

  32. Problem #2: queue access  Rule #1: 1 core per port Katerina Argyraki, SOSP, Oct. 12, 2009 32

  33. Problem #2: queue access  Rule #1: 1 core per port  Rule #2: 1 core per packet Katerina Argyraki, SOSP, Oct. 12, 2009 33

  34. Problem #2: queue access  Rule #1: 1 core per port  Rule #2: 1 core per packet Katerina Argyraki, SOSP, Oct. 12, 2009 34

  35. Problem #2: queue access  Rule #1: 1 core per port  Rule #2: 1 core per packet Katerina Argyraki, SOSP, Oct. 12, 2009 35

  36. Problem #2: queue access  Rule #1: 1 core per port queue  Rule #2: 1 core per packet Katerina Argyraki, SOSP, Oct. 12, 2009 36

  37. Single-server performance Mem I/O hub Mem Ports Cores  First try: 1.3 Gbps  With batching: 3 Gbps  With multiple queues: 9.7 Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 37

  38. Recap  State-of-the art hardware » NUMA architecture, multi-queue NICs  Modified NIC driver » batching  Careful queue-to-core allocation » one core per queue, per packet Katerina Argyraki, SOSP, Oct. 12, 2009 38

  39. Outline  Interconnect  Server optimizations  Performance  Conclusions Katerina Argyraki, SOSP, Oct. 12, 2009 39

  40. Single-server performance Realistic size mix 24.6 24.6 Min-size packets Gbps 9.7 6.35 No-op forwarding IP routing  Realistic size mix: R = 8 – 12 Gbps  Min-size packets: R = 2 – 3 Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 40

  41. Bottlenecks Realistic size mix 24.6 24.6 Min-size packets Gbps 9.7 6.35 No-op forwarding IP routing  Realistic size mix: I/O  Min-size packets: CPU Katerina Argyraki, SOSP, Oct. 12, 2009 41

  42. With upcoming servers 70 70 Realistic size mix Min-size packets Gbps 38.8 25.4 No-op forwarding IP routing  Realistic size mix: R = 23 – 35 Gbps  Min-size packets: R = 8.5 – 12.7 Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 42

  43. RB4 prototype  N = 4 external ports » 1 server per port » full mesh  Realistic size mix: 4 x 8.75 = 35 Gbps » expected R = 8 – 12 Gbps  Min-size packets: 4 x 3 = 12 Gbps » expected R = 2 – 3 Gbps Katerina Argyraki, SOSP, Oct. 12, 2009 43

  44. I did not talk about  Reordering » avoid per-flow reordering » 0.15%  Latency » 24 microseconds per server (estimate)  Open issues » power, form-factor, programming model Katerina Argyraki, SOSP, Oct. 12, 2009 44

  45. Conclusions  RouteBricks: high-end software router » Valiant LB cluster of commodity servers  Programmable with Click  Performance: » easily R = 1Gbps, N = 100s » R = 10Gbps for realistic traffic » for worst case, with upcoming servers Katerina Argyraki, SOSP, Oct. 12, 2009 45

  46. Thank you.  NIC driver and more information at http://routebricks.org Katerina Argyraki, SOSP, Oct. 12, 2009 46

More recommend