Flexible Networking at Large Mega-Scale Exploring issues and solutions
What is “Mega-Scale”? One or more of: ● > 10,000 compute nodes ● > 100,000 IP addresses ● > 1 Tb/s aggregate bandwidth ● Massive East/West traffic between tenants Yahoo is “Mega-Scale”
What are our goals? ● Mega-Scale, with ○ Reliability ■ Yahoo supports ~200 million users/day -- it must be reliable ○ Flexibility ■ Yahoo has 100s of internal and user-facing services ○ Simplicity ■ Undue complexity is the enemy of scale!
Our Strategy Leverage high-performance network design with: ➢ OpenStack ➢ Augmented with additional automation ➢ Hosting applications designed to be “disposable” - Fortunately, we already had many of the needed pieces
Traditional network design ● Large layer 2 domains ● Cheap to build and manage ● Allows great flexibility of solutions ● Leverage pre-existing network design ● IP mobility across the entire domain It’s Simple. But...
L2 Networks Have Limits ● The L2 Domain can only be extended so far ○ Hardware TCAM limitations (size and update rate) ○ STP scaling/stability issues ● But an L3 network can ○ scale larger ○ at less cost ○ but limits flexibility
Potential Solutions ● Why not use a Software Defined Network? ○ Overlay allows IP mobility but ■ Control plane limits scale and reliability ■ Overhead at on-ramp boundaries ○ OpenFlow-based solutions ■ Not ready for mega-scale yet w/ L3 support ■ Control plane complexities Not Ready for Mega-Scale
Our Solution ● Use Clos design network backplane ● Each cabinet has a Top-Of-Rack router ○ Cabinet is a separate L2 domain ○ Cabinets “own” one or more subnets (CIDRs) ○ OpenStack is patched to “know” which subnet to use ● Network backplane supports East-West and North- South traffic equally Well ● Structure is ideal if we decide to deploy SDN overlay
A solution for scale: Layer 3 to the rack L3 L2 ... Compute Racks Compute + Admin • Clos-based L3 network • TOR (Top Of Rack) routers Admin= API, DB, MQ, etc
Adding Robustness With Availability Zones
Problems ● No IP Mobility Between Cabinets ○ Moving a VM between cabinets requires a re-IP ○ Many small subnets rather than one or more large ones ○ Scheduling complexities: ■ Availability zones, rack-awareness ● Other issues ○ Coordination between clusters ○ Integration with existing infrastructure You call that “flexible?”
(re-)Adding Flexibility ● Leverage Load Balancing ○ Allows VMs to be added and removed (remember, our VMs are mostly “disposable”) ○ Conceals IP changes (such as rack/rack movement) ○ Facilitates high-availability ○ Is the key to flexibility in what would otherwise be a constrained architecture
(re-)Adding Flexibility (cont’d) ● Automate it: ○ Load Balancer Management ■ Device selection based on capacity & quotas ■ Association between service groups and VIPs ■ Assignment of VMs to VIPs ○ Availability Zone selection & balancing ○ Multiple cluster integration ● Implement “Service Groups” ○ (external to OpenStack -- for now)
Service Groups ● Consists of groups of VMs running the same application ● Can be a layer of an application stack, an implementation of an internal service, or a user-facing server ● Present an API that functions behind a VIP ○ Web services everywhere!
Service Group Creation
Integrating With Openstack
Putting It Together ● Registration of hosts and services ○ A VM is associated with a service group at creation ○ A tag associated with the service group is accessible to resource allocation ● Control of load balancers ○ Allocates and controls hardware ○ Manages VMs for each service group ○ Provides elasticity and robustness
Putting It Together (cont’d) ● OpenStack Extensions and Patches ○ Three points of integration: 1. Intercept request before issue 2a. Select network based on hypervisor 2b. Transmit new instance information to external automation 3. Transmit deleted instance information to external automation
Wither OpenStack? ● Our Goals: ○ Minimize patching code ○ Minimize points of integration with external systems ○ Contribute back patches of general use ○ Replace custom code with community code: ■ Use Heat for automation ■ Use LBaaS to control load balancers ○ Share our experiences
Complications ● OpenStack clusters don’t exist in a vacuum -- this makes scaling them harder ○ Existing physical infrastructure ○ Existing management infrastructure ○ Interaction with off-cluster resources ○ Security and organizational policies ○ Requirements of existing software stack ○ Stateful application introduce complexities
Conclusion ● Mega-Scale has unique issues ○ Many potential solutions don’t scale sufficiently ○ Some flexibility must be sacrificed *BUT* ○ Mega-Scale also admits solutions that aren’t practical or cost-effective at smaller scale ○ Automation and integration with external infrastructure is key
Questions ? email: edhall@yahoo-inc.com
Recommend
More recommend