networking challenges for the next decade
play

Networking Challenges for the Next Decade Amin Vahdat On behalf of - PowerPoint PPT Presentation

Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017 Google Network More than a collection of data centers FASTER (US, JP, TW) 2016 SJC (JP, HK, SG) 2013


  1. Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017

  2. Google Network More than a collection of data centers FASTER (US, JP, TW) 2016 SJC (JP, HK, SG) 2013 Unity (US, JP) 2010 Network fiber Points of presence >100 Google Global Cache edge nodes

  3. Google Cloud Regions Adding 11 new regions Finland 3 Netherlands 2 London 3 Frankfurt 3 3 Oregon Montreal 2 3 Belgium Iowa 4 California N Virginia 3 Tokyo 3 3 3 S Carolina Taiwan 3 Mumbai 3 Singapore 2 São Paulo 3 Current regions and number of zones # Sydney 3 # Future regions and number of zones

  4. Ubiquitous Cloud...10x Scaling Datacenter Campus & Metro WAN Next-gen disaggregation of Cloud regions and campus Cloud replication and storage, memory and compute expansion driving DC bandwidth intensive cloud interconnect services (e.g., turnkey video, IoT) 10x 10x 10x Step Function Disruptions: Bandwidth, Latency, Availability, Predictability

  5. The Pillars of SDN @ Google B4 Andromeda Jupiter WAN NFV and network Datacenter Interconnect virtualization Networking

  6. The Pillars of SDN @ Google B4 Andromeda Jupiter Espresso WAN NFV and network Datacenter SDN for public Interconnect virtualization Networking Internet

  7. B4: Google's Software Defined WAN B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]

  8. B4: From Copy Network to Business Critical B4 traffic 2012 — 2016 B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]

  9. Andromeda Google Infrastructure Services VNET: 10.1.1/24 Load Balancing DoS VNET: 192.168.32/24 ACLs VNET: 5.4/16 VPN NFV ToR ToR ToR ToR Internal Network 10.1.1/24 10.1.2/24 10.1.3/24 10.1.4/24

  10. Google Datacenter Network Innovation And hardware scale that we could not buy Capacity Jupiter Watchtower Firehose 1.0 1.3Pb/s clusters in 2013 Saturn Firehose 4 Post 1.1 Time 10

  11. The Pillars of SDN @ Google B4 Andromeda Jupiter Public WAN NFV and network Datacenter Internet? Interconnect virtualization Networking

  12. The Pillars of SDN @ Google B4 Andromeda Jupiter Espresso WAN NFV and network Datacenter SDN for public Interconnect virtualization Networking Internet

  13. Espresso in Context B4 Jupiter Data Center Google

  14. Espresso in Context Peering Metro B2 B4 Jupiter Data Center Google Google

  15. Espresso in Context User Peering Metro B2 Espresso B4 Jupiter Data Center Google Internet Google

  16. Espresso: Before and After Router Espresso Cloud 1.0 Centric SDN Protocols Peering Local view Per-metro and global view Connectivity first Application signals Coarse fault recovery Real-time optimization

  17. Espresso Architecture Overview Espresso Metro Peering Fabric BGP speaker Label-switched Fabric eBGP Peering External Peer

  18. Espresso Architecture Overview Espresso Metro Host Peering Fabric Host Host Host Host Host BGP Packet speaker Processor Host Labeled packets Label-switched Host specify egress Fabric Host Host Host eBGP Peering External Peer

  19. Espresso Architecture Overview Global Controller Application Signals Espresso Metro Local Control Host Peering Fabric Host Host Host Host Host BGP Packet speaker Processor Host Labeled packets Label-switched Host specify egress Fabric Host Host Host eBGP Peering External Peer

  20. Next Decade Challenges in Networking The next wave in computing • Serverless compute in Cloud 3.0 • IoT • Tightly coupled, general purpose distributed computing It’s time to put it all together • Agile Scale • Jitter • Isolation • Performance is great, but only meaningful with availability, manageability, and velocity

  21. Last Decade Cloud 1.0 Virtualization delivers capex savings to enterprise DCs

  22. Now HW on Demand Cloud 1.0 Cloud 1.0 Cloud 2.0 Public cloud frees enterprise from private HW infrastructure Scheduling, load balancing primitives, “big data” query processing

  23. The Third Wave of Cloud Computing Compute, not servers Cloud 1.0 Cloud 2.0 Cloud 3.0 Serverless compute, real-time intelligence, and machine learning Not data placement, load balancing, OS configuration and patching

  24. The Third Wave of Cloud Computing Cloud 1.0 Cloud 2.0 Cloud 3.0 Networking should be aiming for Cloud 3.0

  25. Networking and Cloud 3.0 Storage disaggregation: the datacenter is the storage appliance Seamless telemetry and scale up/down Transparent live migration Open Marketplace of services, securely placed and accessed

  26. Networking and Cloud 3.0 Applications+Functions not VMs Policy not middleboxes Actionable Intelligence not data processing SLOs not placement/load balancing/scheduling

  27. Next Decade Challenges in Networking The network will enable next-generation compute infrastructure The network can define next-generation storage infrastructure The right network infrastructure can deliver fundamental new capability

  28. How we Prioritize Infrastructure Work Performance Stranding Velocity Manageability Availability

  29. Availability is Paramount • First things first: an insecure infrastructure is an unavailable infrastructure • Stability is more important than efficiency • Network management is critical • Configuration is hard • Automation matters but can be counter to availability “Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure.” SIGCOMM 2016.

  30. Build for Velocity • Velocity is the speed of iteration • Retrospective on “Tussle in Cyberspace: Defining Tomorrow’s Internet” • Build for hitless upgrades and self-validation • Debugging and tracing matter ○ Without visibility, performance does not matter • Network fabrics built for expansion and evolution • Launch and Iterate

  31. Isolation is Critical; Stranding is Terrible Isolation with reservations is easy but leads to huge resource stranding ● General-purpose, shared infrastructure to approximate custom-built and reserved Isolation has many components ● Latency, bandwidth, but also the control plane ● Accounting and chargeback are big missing pieces Congestion Control is still really hard ● Rationalizing multiple control loops, flow, endpoint, flow group, Traffic Engineering

  32. Performance only Matters if End to End Amdahl’s law applies and so an incredible, localized optimization that takes any effort to adopt will be ignored 1. Scale 2. Jitter 3. Storage Disaggregation Must optimize from the application all the way to the end user

  33. How we Prioritize Infrastructure Work Performance Stranding Velocity Manageability Availability

  34. Next Decade Challenges in Networking The next wave of computing • Serverless compute in Cloud 3.0 • IoT • Tightly coupled, general purpose distributed computing It’s time to put it all together • Agile Scale • Jitter • Isolation • Performance is great, but only meaningful with availability, manageability, and velocity

  35. Thank You! Thank You!

  36. Open Source Google Google Google Google Borg Google Borg MapReduce Bigtable Dremel Google Cloud Platform 36

  37. Open Source TCP Open QUIC gRPC ... BBR Config Google Cloud Platform 37

Recommend


More recommend