balancing on the edge
play

Balancing on the edge Transport affinity without network state Joo - PowerPoint PPT Presentation

Balancing on the edge Transport affinity without network state Joo Taveira Arajo, Lorenzo Saino, Raul Landa and Lennert Buytenhek NSDI 2018 | this is the last slide (sort of) Faild decomposes load balancing as a division of labour


  1. Faild Controller Controller Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table hosts send health status to controller ‣ on drain, update ARP entry ‣ balance virtual next hops across available servers

  2. isn’t this just consistent hashing?

  3. isn’t this just consistent hashing? yes, but we can extend mechanism and avoid resets entirely

  4. Faild Controller Controller Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c b 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c b 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  5. Faild Controller Controller current host Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c b 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c b 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  6. Faild Controller Controller current host Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c c b 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c a b 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  7. Faild Controller Controller current host Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c c 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  8. Faild Controller Controller Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c c 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  9. Faild Controller Controller previous host Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c c b 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c a b 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  10. Faild Controller Controller Destination prefix Next hop IP IP address MAC address Destination prefix Next hop IP IP address MAC address 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:a:a a a 192.168.0.0/24 10.0.1.A 10.0.1.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.2.A 10.0.2.A xx:xx:xx:xx:a:a a a 192.168.0.0/24 10.0.1.B 10.0.2.A xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:b:c c b 192.168.0.0/24 10.0.1.B 10.0.1.B xx:xx:xx:xx:xx:c a b 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:b:a 192.168.0.0/24 10.0.2.B 10.0.2.B xx:xx:xx:xx:xx:a 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:c:c c c 192.168.0.0/24 10.0.1.C 10.0.1.C xx:xx:xx:xx:xx:c A B C 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:c:c c c 192.168.0.0/24 10.0.2.C 10.0.2.C xx:xx:xx:xx:xx:c FIB ARP table FIB ARP table embed mapping history in MAC address ‣ append previous target as part of MAC address ‣ still results in resets, but… ‣ …conveys necessary information down to the host

  11. Host processing C Destination MAC address Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  12. Host processing C Destination MAC address c != b Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  13. Host processing C Destination MAC address Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  14. Host processing C Destination MAC address Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  15. Host processing C Destination MAC address Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  16. Host processing C Destination MAC address Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  17. Host processing C Destination MAC address Match xx:xx:xx:xx:c:b previous? Current target Previous target SYN packet? Process A B C Destina Local socket? xx:xx:xx:xx: Redirect

  18. Host processing B C ess Match Match previous? previous? SYN packet? Process SYN packet? Process A B C Destination MAC address Local socket? Local socket? xx:xx:xx:xx:b:b Redirect Redirect

  19. Host processing B C ess Match Match previous? previous? b == b SYN packet? Process SYN packet? Process A B C Destination MAC address Local socket? Local socket? xx:xx:xx:xx:b:b C Redirect Redirect

  20. Host processing B C ess Match Match previous? previous? b == b SYN packet? Process SYN packet? Process A B C Destination MAC address Local socket? Local socket? xx:xx:xx:xx:b:b Redirect Redirect

  21. Host processing median difference: 14µs 1 . 0 Low latency Cumulative probability 0 . 8 ‣ expected case: switches do all heavy lifting ‣ worst case: detour routing costs 14 μ s 0 . 6 0 . 4 Negligible impact on CPU utilization Steady state 0 . 2 ‣ impact only when refilling Draining ‣ peak CPU utilization below 0.3% 0 . 0 40 60 80 100 120 140 160 180 Round Trip Time [ µ s]

  22. Host processing Low latency Steady state ‣ expected case: switches do all heavy lifting Estimated PDF ‣ worst case: detour routing costs 20 μ s Drain Negligible impact on CPU utilization Refill ‣ impact only when refilling (transient) ‣ peak CPU utilization below 0.3% 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 CPU utilization [%]

  23. Timeline 2012 2014 2016 2018

  24. Timeline deployed globally 2012 2014 2016 2018

  25. Timeline 3x 10 14 deployed globally requests per day 2012 2014 2016 2018

  26. we suspect it works

  27. Assumption #1 hash buckets are equally loaded

  28. Hashing 4 k Requests per second 3 k 2 k 0 5 10 15 20 25 30 Time [min] Implications for capacity planning ‣ you are bound by most loaded host in a cluster

  29. Hashing 4 k Requests per second 3 k 2 k 0 5 10 15 20 25 30 Time [min] Implications for capacity planning ‣ you are bound by most loaded host in a cluster

  30. Uneven hashing 1 . 8 Inject synthetic, equally distributed traffic 1 . 6 Normalized bucket load 1 . 4 1 . 2 1 . 0 0 . 8 0 . 6 0 . 4 0 . 2 0 50 100 150 200 250 Rank of nexthop

  31. Uneven hashing 1 . 8 1 . 8 Inject synthetic, equally distributed traffic 1 . 6 1 . 6 Normalized bucket load Normalized bucket load 1 . 4 1 . 4 1 . 2 1 . 2 1 . 0 1 . 0 0 . 8 0 . 8 0 . 6 0 . 6 0 . 4 0 . 4 0 . 2 0 . 2 0 0 50 50 100 100 150 150 200 200 250 250 Rank of nexthop Rank of nexthop

  32. Uneven hashing 1 . 8 1 . 8 Inject synthetic, equally distributed traffic 1 . 6 1 . 6 Normalized bucket load Normalized bucket load 1 . 4 1 . 4 Significant skew 1 . 2 1 . 2 ‣ most loaded bucket 6 times more loaded 
 1 . 0 1 . 0 than the least loaded 0 . 8 0 . 8 0 . 6 0 . 6 0 . 4 0 . 4 0 . 2 0 . 2 0 0 50 50 100 100 150 150 200 200 250 250 Rank of nexthop Rank of nexthop

  33. Uneven hashing 1 . 8 1 . 8 Inject synthetic, equally distributed traffic 1 . 6 1 . 6 Normalized bucket load Normalized bucket load 1 . 4 1 . 4 Significant skew 1 . 2 1 . 2 ‣ most loaded bucket 6 times more loaded 
 1 . 0 1 . 0 than the least loaded 0 . 8 0 . 8 Behaviour can depend on number of nexthops 0 . 6 0 . 6 ‣ some buckets received no traffic for specific 0 . 4 0 . 4 number of configured nexthops 
 0 . 2 0 . 2 0 0 50 50 100 100 150 150 200 200 250 250 Rank of nexthop Rank of nexthop

  34. Assumption #2 switches hash identically

  35. Hash polarization

  36. Hash polarization

  37. Hash polarization

  38. Hash polarization

  39. Hash polarization Vendors were told hash polarization was bad ‣ in many cases you can’t configure seed ‣ in one case, you can configure the seed, but vendor additionally uses boot order of linecards to add entropy

  40. Assumption #3 packets in a flow use same network path

  41. Nope, things break Fragmentation ‣ returning ICMP packets hash on outer header ‣ took draft to IETF in 2014 ECN ‣ some middleboxes hash on TOS field ‣ ended up turning ECN negotation off, breaks anycast too ‣ still looking for vendor(s) behind this, affected multiple ISPs SYN proxies ‣ recent trend in enterprise appliances ‣ route lookup after connection handoff results in new path ‣ one vendor fixed implementation

  42. paper has lots more stuff ‣ SYN cookie handling ‣ ARP reconfiguration measurements ‣ evaluation of switch and host draining ‣ switch controller details ‣ host-side implementation quirks ‣ ECMP skew results ‣ switch memory ‣ real flow measurements ‣ vendors that don’t test their products ‣ …

  43. NSDI the value is not in the implementation

  44. NSDI the value is in the design

Recommend


More recommend