reliable communication for datacenters
play

Reliable Communication for Datacenters Mahesh Balakrishnan Cornell - PowerPoint PPT Presentation

Maelstrom Ricochet Conclusion Reliable Communication for Datacenters Mahesh Balakrishnan Cornell University Mahesh Balakrishnan Reliable Communication for Datacenters Maelstrom Ricochet Conclusion Datacenters Internet Services (90s)


  1. Maelstrom Ricochet Conclusion Reliable Communication for Datacenters Mahesh Balakrishnan Cornell University Mahesh Balakrishnan Reliable Communication for Datacenters

  2. Maelstrom Ricochet Conclusion Datacenters ◮ Internet Services (90s) — Websites, Search, Online Stores ◮ Since then: # of low-end volume servers 30 Installed Server Base 00-05: 25 20 Millions ◮ Commodity — up by 100% 15 10 ◮ High/Mid — down by 40% 5 0 2000 2001 2002 2003 2004 2005 ◮ Today: Datacenters are ubiquitous ◮ How have they evolved? Data partially sourced from IDC press releases (www.idc.com) Mahesh Balakrishnan Reliable Communication for Datacenters

  3. Maelstrom Ricochet Conclusion Networks of Datacenters RTT: 110 ms Why? 100 ms 200 ms Business Continuity, 220 ms 100 ms Client Locality, 110 ms Distributed Datasets or Operations ... Any modern 210 ms enterprise! N W E S Mahesh Balakrishnan Reliable Communication for Datacenters

  4. Maelstrom Ricochet Conclusion Networks of Real-Time Datacenters ◮ Finance, Aerospace, Military, Search and Rescue... ◮ ... documents, chat, email, games, videos, photos, blogs, social networks ◮ The Datacenter is the Computer! ◮ Not hard real-time: real fast, highly responsive, time-critical Mahesh Balakrishnan Reliable Communication for Datacenters

  5. Maelstrom Ricochet Conclusion Networks of Real-Time Datacenters ◮ Finance, Aerospace, Military, Search and Rescue... ◮ ... documents, chat, email, games, videos, photos, blogs, social networks ◮ The Datacenter is the Computer! ◮ Not hard real-time: real fast, highly responsive, time-critical Gartner Survey: ◮ Real-Time Infrastructure (RTI): reaction time in secs/mins ◮ 73%: RTI is important or very important ◮ 85%: Have no RTI capability Mahesh Balakrishnan Reliable Communication for Datacenters

  6. Maelstrom Ricochet Conclusion The Real-Time Datacenter — Systems Challenges How do we recover from failures within seconds? Real-World, Real-Time Mahesh Balakrishnan Reliable Communication for Datacenters

  7. Maelstrom Ricochet Conclusion The Real-Time Datacenter — Systems Challenges How do we recover from failures within seconds? Crashes Overloads Bugs Software Stack Exploits Disk Failure, Disasters Packet Loss Real-World, Real-Time Compilers, Databases, Distributed Systems, Machine Learning, Filesystems, Operating Systems, Networking ... Mahesh Balakrishnan Reliable Communication for Datacenters

  8. Maelstrom Ricochet Conclusion The Real-Time Datacenter — Systems Challenges How do we recover from failures within seconds? Crashes Tempest Overloads Bugs Software Stack Exploits KyotoFS SMFS Disk Failure, Disasters Plato Packet Ricochet Maelstrom Loss Real-World, Real-Time Compilers, Databases, Distributed Systems, Machine Learning, Filesystems, Operating Systems, Networking ... Mahesh Balakrishnan Reliable Communication for Datacenters

  9. Maelstrom Ricochet Conclusion Reliable Communication Goal: Recover lost packets fast ! ◮ Existing protocols react to loss: too much, too late ◮ We want proactive recovery: stable overhead, low latencies ◮ Maelstrom: Reliability between datacenters [NSDI 2008] ◮ Ricochet: Reliability within datacenters [NSDI 2007] Mahesh Balakrishnan Reliable Communication for Datacenters

  10. Maelstrom Ricochet Conclusion Reliable Communication between Datacenters TCP fails in three ways: 1. Throughput Collapse 100ms RTT, 0.1% Loss, 40 Gbps → Tput < 10 Mbps! 2. Massive Buffers required for High-Rate Traffic 3. Recovery Delays for Time-Critical Traffic Mahesh Balakrishnan Reliable Communication for Datacenters

  11. Maelstrom Ricochet Conclusion Reliable Communication between Datacenters TCP fails in three ways: 1. Throughput Collapse 100ms RTT, 0.1% Loss, 40 Gbps → Tput < 10 Mbps! 2. Massive Buffers required for High-Rate Traffic 3. Recovery Delays for Time-Critical Traffic Current Solutions: ◮ Rewrite Apps: One Flow → Multiple Split Flows ◮ Resize Buffers ◮ Spend (infinite) money! Mahesh Balakrishnan Reliable Communication for Datacenters

  12. Maelstrom Ricochet Conclusion TeraGrid: Supercomputer Network Mahesh Balakrishnan Reliable Communication for Datacenters

  13. Maelstrom Ricochet Conclusion TeraGrid: Supercomputer Network ◮ End-to-End UDP Probes: Zero Congestion, Non-Zero Loss! 30 ◮ Possible Reasons: 24 25 ◮ transient congestion % of Measurements 20 ◮ degraded fiber ◮ malfunctioning HW 14 15 ◮ misconfigured HW 10 ◮ switching contention ◮ low receiver power 5 ◮ end-host overflow 0 ◮ ... 0.01 0.03 0.05 0.07 0.1 0.3 0.5 0.7 1 % of Lost Packets Mahesh Balakrishnan Reliable Communication for Datacenters

  14. Maelstrom Ricochet Conclusion TeraGrid: Supercomputer Network ◮ End-to-End UDP Probes: Zero Congestion, Non-Zero Loss! 30 ◮ Possible Reasons: 24 25 ◮ transient congestion % of Measurements 20 ◮ degraded fiber ◮ malfunctioning HW 14 15 ◮ misconfigured HW 10 ◮ switching contention ◮ low receiver power 5 ◮ end-host overflow 0 ◮ ... 0.01 0.03 0.05 0.07 0.1 0.3 0.5 0.7 1 % of Lost Packets Electronics: Cluttered Pathways Optics: Lossy Fiber Mahesh Balakrishnan Reliable Communication for Datacenters

  15. Maelstrom Ricochet Conclusion Problem Statement Run unmodified TCP/IP over lossy high-speed long-distance networks Mahesh Balakrishnan Reliable Communication for Datacenters

  16. Maelstrom Ricochet Conclusion The Maelstrom Network Appliance Router Router Sending End-hosts Receiving End-hosts Packet Loss Commodity TCP Commodity TCP Mahesh Balakrishnan Reliable Communication for Datacenters

  17. Maelstrom Ricochet Conclusion The Maelstrom Network Appliance Maelstrom Maelstrom Send-Side Receive-Side Appliance Appliance Router Router Sending End-hosts Receiving End-hosts Packet Loss Commodity TCP Commodity TCP Transparent: No modification to end-host or network Mahesh Balakrishnan Reliable Communication for Datacenters

  18. Maelstrom Ricochet Conclusion The Maelstrom Network Appliance Maelstrom Maelstrom Send-Side Receive-Side Appliance Appliance FEC Encode Decode Router Router Sending End-hosts Receiving End-hosts Packet Loss Commodity TCP Commodity TCP Transparent: No modification to end-host or network FEC = Forward Error Correction Mahesh Balakrishnan Reliable Communication for Datacenters

  19. Maelstrom Ricochet Conclusion What is FEC? A B C D E X X X C D E A B 3 repair packets from Receiver can recover every 5 data packets from any 3 lost packets Rate : ( r , c ) — c repair packets for every r data packets. ◮ Pro: Recovery Latency independent of RTT ◮ Constant Data Overhead: c r + c ◮ Packet-level FEC at End-hosts: Inexpensive, No extra HW Mahesh Balakrishnan Reliable Communication for Datacenters

  20. Maelstrom Ricochet Conclusion What is FEC? A B C D E X X X C D E A B 3 repair packets from Receiver can recover every 5 data packets from any 3 lost packets Rate : ( r , c ) — c repair packets for every r data packets. ◮ Pro: Recovery Latency independent of RTT ◮ Constant Data Overhead: c r + c ◮ Packet-level FEC at End-hosts: Inexpensive, No extra HW ◮ Con: Recovery Latency dependent on channel data rate Mahesh Balakrishnan Reliable Communication for Datacenters

  21. Maelstrom Ricochet Conclusion What is FEC? A B C D E X X X C D E A B 3 repair packets from Receiver can recover every 5 data packets from any 3 lost packets Rate : ( r , c ) — c repair packets for every r data packets. ◮ Pro: Recovery Latency independent of RTT ◮ Constant Data Overhead: c r + c ◮ Packet-level FEC at End-hosts: Inexpensive, No extra HW ◮ Con: Recovery Latency dependent on channel data rate ◮ FEC in the Network: ◮ Where and What? Mahesh Balakrishnan Reliable Communication for Datacenters

  22. Maelstrom Ricochet Conclusion The Maelstrom Network Appliance Maelstrom Maelstrom Send-Side Receive-Side Appliance Appliance FEC Encode Decode Router Router Sending End-hosts Receiving End-hosts Packet Loss Commodity TCP Commodity TCP Transparent: No modification to end-host or network FEC = Forward Error Correction Where: at the appliance, What: aggregated data Mahesh Balakrishnan Reliable Communication for Datacenters

  23. Maelstrom Ricochet Conclusion Maelstrom Mechanism Send-Side Appliance: Appliance ◮ Snoop IP packets 29 28 27 26 25 ◮ Create repair packet = XOR + ‘recipe’ of data X ‘Recipe List’: Lambda Jumbo MTU packet IDs 25,26,27,28,29 29 28 LAN MTU XOR LOSS 27 26 25 Recovered Packet Appliance 25 26 28 29 27 Mahesh Balakrishnan Reliable Communication for Datacenters

  24. Maelstrom Ricochet Conclusion Maelstrom Mechanism Send-Side Appliance: Appliance ◮ Snoop IP packets 29 28 27 26 25 ◮ Create repair packet = XOR + ‘recipe’ of data X ‘Recipe List’: Lambda Jumbo MTU packet IDs 25,26,27,28,29 29 28 LAN MTU XOR LOSS 27 Receive-Side Appliance: 26 ◮ Lost packet recovered 25 using XOR and other Recovered data packets Packet Appliance ◮ At receiver end-host: out 25 26 28 29 27 of order, no loss Mahesh Balakrishnan Reliable Communication for Datacenters

Recommend


More recommend