ac dc tcp virtual congestion control enforcement for
play

AC DC TCP: Virtual Congestion Control Enforcement for Datacenter - PowerPoint PPT Presentation

AC DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks Ke Keqiang He He , Eric Rozner, Kanak Agarwal, Yu Gu, Wes Felter, John Carter, Aditya Akella 1 Datacenter Network Congestion Control Congestion is not rare in


  1. AC ⚡ DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks Ke Keqiang He He , Eric Rozner, Kanak Agarwal, Yu Gu, Wes Felter, John Carter, Aditya Akella 1

  2. Datacenter Network Congestion Control • Congestion is not rare in datacenter networks [Singh, SIGCOMM’15] • Tail latency is huge • 99.9 th -tile latency is orders of magnitude higher than the median [Mogul, HotOS’15] • Queueing latency is the major contributor [Jang, SIGCOMM’15] • New datacenter TCP congestion control schemes have been proposed • E.g., DCTCP, TIMELY, DCQCN, TCP-Bolt, ICTCP, etc 2

  3. But, We Can Not Control VM TCP Stacks • In multi-tenant datacenters, admins can not control VM TCP stacks • Because VMs are setup and managed by different entities Tenant 3 Tenant 1 Tenant 2 VM VM VM Therefore, outdated, inefficient, or misconfigured TCP stacks TCP/IP stack TCP/IP stack TCP/IP stack can be implemented in the VMs. Virtualization Infrastructure Servers Storage This leads to 2 main problems. Networking 3

  4. Problem #1: Large Queueing Latency switch queue P P P P P P P P sender receiver TCP RTT can reach tens of milliseconds because of packet queueing. No queueing latency, TCP RTT is around 60 to 200 microseconds 4

  5. Problem #2: TCP Unfairness • ECN and non-ECN coexistence problem [Judd, NSDI’15] • Non-ECN: e.g., CUBIC • ECN: e.g., DCTCP 5

  6. Problem #2: TCP Unfairness (cont.) CC: Congestion Control • Different congestion control algorithms lead to unfairness receivers senders Dumbbell topology 5 flows with different CC algorithms congest a 10G link 6

  7. AC � � ⚡ DC TCP: Administrator Control over Data Center TCP Implements TCP congestion control in the Virtual Switch Ensures VM TCP stacks can not impact the network 7

  8. AC ⚡ DC: High Level View Virtual Machines AC/DC Apps Apps Apps (sender) Per-flow CC feedback Uniform per-flow CC OS OS OS vNIC vNIC vNIC vSwitch Control plane Data path (AC/DC) Server AC/DC Case study: DCTCP (receiver) Datacenter Network CC in the vSwitch 8

  9. AC ⚡ DC Benefits • No modifications to VMs or hardware • Low latency provided by state-of-the-art CC algorithms • Improved TCP fairness and support both ECN and non-ECN flows • Enforce per-flow differentiation via congestion control, e.g., • East-west and north-south flows can use different CCs (web server) • Give higher priority to “mission-critical” traffic (backend VM) 9

  10. AC ⚡ DC Design • Obtaining Congestion Control State • DCTCP Congestion Control in the vSwitch • Enforcing Congestion Control • Per-flow Differentiation via Congestion Control 10

  11. Obtaining Congestion Control State • Per-flow connection tracking • All traffic goes through the virtual switch • We can reconstruct CC via monitoring all the packets of a connection Flow Updating CC Packet classification variables • Maintain per-flow congestion control variables • E.g., CC-related sequence numbers, dupack counter etc 11

  12. DCTCP Congestion Control in the vSwitch • Universal ECN marking • Get ECN feedback 12

  13. Universal ECN Marking • Why? • Not all VMs run ECN-Capable Transports (ECT) like DCTCP • Universal ECN Marking • All packets entering the fabric should be ECN-marked by the virtual switch • Solves the ECN and non-ECN coexistence problem 13

  14. Get ECN Feedback Congestion Experienced (CE) marked Receiver Sender congested side side switch Need a way to carry the congestion information back. 14

  15. Get ECN Feedback Congestion Experienced (CE) marked congested AC/DC AC/DC switch sender receiver Congestion feedback is encoded as 8 bytes: {ECN_bytes, Total_bytes}. Piggybacked on an existing TCP ACK (PACK). 15

  16. DCTCP Congestion Control in the vSwitch Incoming ACK Extract CC info if it is PACK; Update connection tracking variables; Update ⍺ once every RTT; Yes Yes Congestion? Loss? No No ⍺ = max_alpha; DCTCP Yes Cut wnd in last tcp_cong_avoid(); Congestion RTT? Control Law No wnd=wnd*(1 - ⍺ /2); AC/DC enforces CC on the flow; Send ACK to VM; 16

  17. Enforcing Congestion Control • TCP sends min(CWND, RWND) • CWND is congestion control window ( congestion control ) • RWND is receiver’s advertised window ( flow control ) • AC ⚡ DC reuses RWND for congestion control purpose • VMs with unaltered TCP stacks will naturally follow our enforcement • Non-conforming flows can be policed by dropping any excess packets not allowed by the calculated congestion window • Loss has to be recovered e2e, this incentivizes tenants to respect standards 17

  18. Control Law for Per-flow Differentiation 𝑆𝑋𝑂𝐸 = 𝑆𝑋𝑂𝐸 ∗ (1 − 𝛽 DCTCP: 2) 𝑆𝑋𝑂𝐸 = 𝑆𝑋𝑂𝐸 ∗ (1 − (𝛽 − 𝛽𝛾 AC ⚡ DC TCP: 2 )) When 𝛾 is close to 1, it becomes DCTCP. When 𝛾 is close to 0, it backs-off aggressively. Larger 𝛾 for higher priority traffic. 18

  19. Implementation • Prototype implementation in Open vSwitch kernel datapath • ~1200 LoC added • Our design leverages available techniques to improve performance • RCU-enabled hash tables to perform connection tracking • AC ⚡ DC manipulates TCP segments, instead of MTU-sized packets • AC ⚡ DC leverages NIC checksumming so the TCP checksum does not have to be recomputed after header fields are modified VM1 VM2 Stack Stack Manipulates TCP segments TCP segment AC ⚡ DC TSO Hypervisor TCP/IP NIC recalculates TCP checksum P P P NIC P 19

  20. Evaluation • Testbed: 17 servers (6-core, 60GB memory), 6 10Gbps switches • Microbenchmark topologies receivers senders receiver senders Incast topology Dumbbell topology 20

  21. Evaluation • Macrobechmark topology 17 servers attached to a 10G switch. • Metrics: TCP RTT, loss rate, Flow Completion Time (FCT) 21

  22. Experiment Setting (compared 3 schemes) • CUBIC • CUBIC stack on top of standard OVS • DCTCP • DCTCP stack on top of standard OVS • AC ⚡ DC • CUBIC/Reno/Vegas/HighSpeed/Illinois stacks on top of AC ⚡ DC VM VM VM VMs Any CUBIC DCTCP OVS OVS AC ⚡ DC Hypervisor CUBIC DCTCP AC ⚡ DC 22

  23. receivers senders Tracking Window Size Dumbbell topology Running DCTCP stack on top of AC ⚡ DC, only outputs calculated RWND without enforcement. AC ⚡ DC closely tracks the window size of DCTCP. 23

  24. receivers senders Convergence Dumbbell topology CUBIC DCTCP AC/DC AC/DC has comparable convergence properties as DCTCP and is better than CUBIC. 24

  25. receivers senders AC ⚡ DC improves fairness when VMs use different CCs Dumbbell topology Standard OVS AC ⚡ DC 25

  26. receivers senders Overhead (CPU and Memory) Dumbbell topology Sender side Less than 1% additional CPU overhead compared with the baseline. Each connection uses 320 bytes to maintain CC variables (10k connections use 3.2MB). 26

  27. receiver TCP Incast RTT & drop rate senders Incast topology 50 th percentile RTT 99.9 th percentile RTT Packet drop rate AC ⚡ DC tracks the performance of DCTCP closely. 27

  28. Flow completion time with 17 servers attached trace-driven workloads to a 10G switch. Web-searching workload (DCTCP) Data-mining workload (CONGA) AC ⚡ DC obtains same performance as DCTCP. AC ⚡ DC can reduce FCT by 36% - 76% compared with default CUBIC. 28

  29. Summary • AC ⚡ DC allows administrators to regain control over arbitrary tenant TCP stacks by enforcing congestion control in the virtual switch • AC ⚡ DC requires no changes to VMs or network hardware • AC ⚡ DC is scalable, light-weight (< 1% CPU overhead) and flexible 29

  30. Thanks! 30

  31. Backup Slides 31

  32. Related Work • DCTCP • ECN-based congestion control for DCNs • TIMELY • Latency-based congestion control for DCNs • Accurate latency measurement provided by accurate NIC timestamps • vCC • vCC and AC ⚡ DC are closely related works by two independent teams J 32

  33. ECN and non-ECN Coexistence ECN Non-ECN Switch configured with WRED/ECN When queue occupancy is larger than marking threshold, non-ECN packets are dropped 33

  34. IPSec • AC ⚡ DC is not able to inspect the TCP headers for IPSec traffic • May perform approximating rate limiting based on congestion feedback information. 34

Recommend


More recommend