compsci 514 computer networks lecture 15 practical
play

CompSci 514: Computer Networks Lecture 15 Practical Datacenter - PowerPoint PPT Presentation

CompSci 514: Computer Networks Lecture 15 Practical Datacenter Networks Xiaowei Yang Overview Wrap up DCTCP analysis Today Googles datacenter networks Topology, routing, and management Inside Facebooks datacenter


  1. CompSci 514: Computer Networks Lecture 15 Practical Datacenter Networks Xiaowei Yang

  2. Overview • Wrap up DCTCP analysis • Today – Google’s datacenter networks • Topology, routing, and management – Inside Facebook’s datacenter networks • Services and traffic patterns

  3. The DCTCP Algorithm 3

  4. Review: The TCP/ECN Control Loop Sender 1 ECN = Explicit Conges1on No1fica1on ECN Mark (1 bit) Receiver Sender 2 4

  5. Two Key Ideas 1. React in proportion to the extent of congestion, not its presence . ü Reduces variance in sending rates, lowering queuing requirements. ECN Marks TCP DCTCP 1 0 1 1 1 1 0 1 1 1 Cut window by 50% Cut window by 40% 0 0 0 0 0 0 0 0 0 1 Cut window by 50% Cut window by 5% 2. Mark based on instantaneous queue length. ü Fast feedback to better deal with bursts. 18

  6. Small Queues & TCP Throughput: The Buffer Sizing Story • Bandwidth-delay product rule of thumb: – A single flow needs buffers for 100% Throughput. Cwnd Buffer Size B Throughput 100% 17

  7. Data Center TCP Algorithm B K Don’t Switch side: Mark Mark – Mark packets when Q ueue Length > K. Sender side: – Maintain running average of fraction of packets marked ( α ) . In each RTT: The picture can't be displayed. Ø Adaptive window decreases: – Note: decrease factor between 1 and 2. 19

  8. Analysis • How low can DCTCP maintain queues without loss of throughput? • How do we set the DCTCP parameters? Ø Need to quantify queue size oscillations (Stability). Window Size W*+1 W* (W*+1)(1-α/2) Time 22

  9. Analysis • How low can DCTCP maintain queues without loss of throughput? • How do we set the DCTCP parameters? Ø Need to quantify queue size oscillations (Stability). Packets sent in this Window Size RTT are marked. W*+1 W* (W*+1)(1-α/2) Time 22

  10. Analysis • Q(t) = NW(t) − C × RTT • The key observa8on is that with synchronized senders, the queue size exceeds the marking threshold K for exactly one RTT in each period of the saw-tooth, before the sources receive ECN marks and reduce their window sizes accordingly. • S(W 1 ,W 2 )=(W 22 −W 12 )/2 • Cri8cal window size when ECN marking occurs: W ∗ =(C×RTT+K)/N

  11. • α = S(W ∗ ,W ∗ + 1)/S((W ∗ + 1)(1 − α/2),W ∗ + 1) • α 2 (1 − α/4) = (2W ∗ + 1)/(W ∗ + 1) 2 ≈ 2/W ∗ – Assuming W*>>1 • α ≈ sqrt(2/W ∗ ) • Single flow oscillation – D = (W ∗ +1)−(W ∗ +1)(1−α/2) A = ND = N ( W ∗ + 1) α / 2 ≈ N √ 2 W ∗ 2 = 1 p 2 N ( C × RT T + K ) , (8) 2 T C = D = 1 p 2( C × RT T + K ) /N (in RTTs). (9) 2 Finally, using (3), we have: Q max = N ( W ∗ + 1) − C × RT T = K + N. (10)

  12. Analysis • How low can DCTCP maintain queues without loss of throughput? • How do we set the DCTCP parameters? Ø Need to quan+fy queue size oscilla+ons (Stability). Q min = Q max − A (11) = K + N − 1 p 2 N ( C × RTT + K ) . (12) 2 Minimizing Qmin 85% Less Buffer than TCP 22

  13. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat

  14. What’s this paper about • Experience track • How Google datacenter evolve over a decade

  15. Key takeaways • Customized switches built using merchant silicon • Recursive Clos to scale to a large number of servers • Centralized control/management

  16. • Bandwidth demands in the datacenter are doubling every 12-15 months, even faster than the wide area Internet.

  17. Traditional four-post cluster Top of Rack (ToR) switches serving 40 1G-connected servers were • connected via 1G links to four 512 1G port Cluster Routers (CRs) connected with 10G sidelinks. 512*40 ~20K hosts

  18. • When a lot of traffic leaves a rack, conges3on occurs

  19. Solutions • Use merchant silicon to build non- blocking/high port density switches • Watchtower: 16*10G silicon

  20. Exercise • 24*10G silicon • 12-line cards • 288 port non-blocking switch

  21. Jupiter • Dual redundant 10G links for fast failover • Centauri as ToR • Four Centauris made up a Middle Block (MB) • Each ToR connects to eight MBs. • Six Centauris in a spine plane block

  22. • Four MBs per rack • Two spine blocks per rack

  23. Without bundle With bundling

  24. Summary • Customized switches built using merchant silicon • Recursive Clos to scale to a large number of servers

  25. Inside the Social Network’s (Datacenter) Network Arjun Roy, Hongyi Zeng†, Jasmeet Bagga†, George Porter, and Alex C. Snoeren

  26. Motivation • Measurement can help make design decisions – Traffic pa(ern determines the op2mal network topology – Flow size distribu2on helps with traffic engineering – Packet size helps with SDN control

  27. Service level architecture of FB • Servers are organized into clusters • Clusters may not fit into one rack

  28. Measurement methodology

  29. Summary • Traffic is neither rack-local nor all-to-all; locality depends upon the service but is stable across :me periods from seconds to days • Many flows are long-lived but not very heavy. • Packets are small

  30. Today • Wrap up DCTCP analysis • Today – Google’s datacenter networks • Topology, routing, and management – Inside Facebook’s datacenter networks • Services and traffic patterns

Recommend


More recommend