Using PubSub For Scheduling in Azure SDN Qi Zhang (Microsoft - Azure Networking)
Azure Networking Regional Cable Azure Region ‘A’ Network Consumers CDN Regional Network Carrier Microsoft Edge Enterprise, SMB, WAN mobile Azure Region ‘B’ ExpressRoute Regional Internet Network Exchanges Enterprise Regional DC/Corpnet Network DC Hardware Services Intra-Region WAN Backbone Edge and ExpressRoute CDN Last Mile • SmartNIC/FPGA • Virtual Networks • DC Networks • Software WAN • Internet Peering • Acceleration for • E2E monitoring • SONiC • Load Balancing • Regional Networks • Subsea Cables • ExpressRoute applications and (Network Watcher, • VPN Services • Optical Modules • Terrestrial Fiber content Network Performance • Firewall • National Clouds Monitoring) • DDoS Protection • DNS & Traffic Management
Microsoft Global Network Svalbard Greenland United States Sweden Russia Norway United Canada Kingdom Poland Ukraine Kazakistan Russia France Turkey United States One of the largest private China Iran Algeria Pacific Ocean networks in the world Atlanta Saudi Libya Egypt Ocean Myanmar Arabia Mexico India (Burma) Niger Mali Sudan Chad • 8,000+ ISP sessions Pacific Ocean Nigeria Ethiopi Venezuela a Colombia • Dr Congo 130+ edge sites Indonesia Peru Angola Zambia Indian Ocean Brazil • 44 ExpressRoute locations Bolivia Nambia • Australia 33,000 miles of lit fiber South Africa Data Owned Capacity Argentina • SDN Managed (SWAN, OLS) center Leased Capacity Edge Site Moving to Owned DCs and Network sites not exhaustive
Software Defined Management Central Commodity HW Networking API Controllers (SDN) vNIC vNIC vNIC vNIC vNIC vNIC Azure SDN Host Agents Basis of all NW virtualization in our datacenters Control Plane Centralized, hierarchical, highly scalable and available controllers SmartNIC Data Plane Host agent, drivers Key to flexibility and scale is SDN
PubSub in SDN • Scale: • 40+ regions, hundreds of DCs, millions of servers • millions of VNets and LBs • Flexible, scalable and efficient scheduling between controllers and agents • Publisher/Subscriber pattern Controller Publish flow PubSub Notification flow Agent i Agent N Agent 1
Virtual Network Virtual Network Virtual Network in Azure VNet Peering Secure per customer virtual Virtual Network datacenter in the cloud Instantiate and configure Cross premises Internet complex topologies in Connectivity minutes Rich security and networking services Virtual Network Virtual Network
CA-PA Mappings Directory Payload, including CA, is encapsulated Service Traverses physical network 10.1.5.2 Payload 10.1.1.2 CA PA 10.0.0.1 10.1.1.2 CA PA 10.0.0.1 10.1.5.3 10.0.0.4 10.1.1.3 10.0.0.7 10.1.1.4 10.0.0.6 10.1.3.3 . . . . . . 10.0.0.4 10.1.3.2 VM-SW3 VM-SW2 VM-SW1 10.0.0.7 10.1.5.2 10.0.0.7 10.0.0.7 Payload Payload 10.0.0.1 10.0.0.1 PA 10.1.1.2 PA 10.1.1.4 PA 10.1.1.3 PA 10.1.3.3 PA 10.1.3.2 PA 10.1.5.3 PA 10.1.5.2 VM2 VM1 CA 10.0.0.1 CA 10.0.0.7 CA 10.0.0.4 CA 10.0.0.6 CA 10.0.0.4 CA 10.0.0.1 CA 10.0.0.7 Host Node 1 Host Node 2 Host Node 3 Data traffic Control msgs
PubSub for CA-PA Mapping Challenges: • Scale: hundreds K agents, millions of VNets • Scope: cluster, regional, global • VNet size limit: 4K mappings -> 64K mappings, 500 peerings • Provisioning Speed: minutes -> seconds VNet Controller VNet Controller Directory Service PubSub Agent 1 Agent i Agent N Agent 1 Agent i Agent N
Scenario I: Global Peering Region B / VNET B Region A / VNET A VNet VNet Controller Controller PubSub PubSub Agent Agent
Scenario II: DataExfil Resource “Metadata” METADATA (resource A): NRP { subscription: “{guid}, Resource A account: “users”, Policy Service Tunnel Policy storage_type: “blob” { } id: “policy-123”, PubSub Resource “Metadata” service: “xstore”, subscription: “{guid}, METADATA (resource B): accounts: [ { BLOCK “users”, subscription: “{guid}, Resource B Host “wiki.*” account: “users”, ], storage_type: “table” Agent storage_type: “blob”, } access: “rw” VNetPolicyCache Resource “Metadata” } METADATA (resource C): { subscription: “{guid}, Storage FE Resource C account: “wikimain”, storage_type: “blob” }
Publisher Overview Publish Query CreateNode GetNodeInfo • Persisted KV Store UpdateNode • Hierarchical name space • Set watcher on a node Root • Single watcher … … PKi PKn PK1 • Bulk watcher W W • Interfaces a1 a2 a3 n • Publish (batch/multi supported) b1 b2 b3 b5 • Subscribe b4 • Notification • Query Notification Subscribe Created, Deleted • State Update/Delivery watcher DataChanged bulkwatcher ChildrenChanged • Initial state • Subsequent state updates Subscriber
Partition Key Partition Key 4 Microservices: Stateless Service • Routing Service • Notification Service Stateful Service • Selector Service • Madari Service SDN PubSub Service
Publisher (Vnet Controller) Subscriber Agent) PK: /Vnet/{VnetId1}, PK: /Vnet/{VnetId2} 1 6 Path: /mappings/ipv4/{CA1} Path: / 1 6 <notifications> Data (bond message): {PA1} Partition Key Partition Key 2 2 /Vnet/{VnetId1} /Vnet/{VnetId2} 4 Microservices: 3 3 Stateless Service • Routing Service MadariService_02 MadariService_03 • Notification Service 4 5 PK: /Vnet/{VnetId1}, 4 5 Path: /mappings/ipv4/{CA1} SetBulkWatcher: <notifications> Data (bond message): {PA1} PK: /Vnet/{VnetId2} Stateful Service • Selector Service • Madari Service SDN PubSub Service
Madari Selector Service: Data Partitioning AddPartitionKey(“baz”) 1 Selector Service MadariService_01 3 2 MadariService_02 Madari Instance Total Data Size Partition Key Madari Instance MadariService_03 “foo” MadariService_01 MadariService_01 1.05G “bar” MadariService_02 MadariService_02 1.9G ….. ….. MadariService_03 1.6G “baz” MadariService_01
Subscription through Notification Service MadariService_02 MadariService_04 ….. ….. Root Root ….. A C D vnet vnet vnet vnet B 1 2 3 4 A C B D ….. ….. ….. NotificationService_03 NotificationService_08 vnet1 vnet1 vnet1 vnet2 vnet3 Subscriber Subscriber Subscriber I III II
Service Fabric Ring • Service Fabric ring • Multiple PaaS tenants form a Service Tenant1 Fabric ring n1 n2 n3 n4 n5 • Service Fabric ring is on a VNET Cluster1 • PubSub as Service Fabric application Tenant2 • Routing Service/Notification Service n6 n7 n8 n9 n10 Cluster2 • Stateless Tenant3 • On every node n11 n12 n13 n14 n15 • MadariService/MadariSelectorService(s) Cluster3 • Stateful • Min 3, target 7
Client Libraries • Managed Libraries Commit • Madari.ClientLibrary hooks • Publishing through WCF channel Mark objects • Reliable Publisher Commit hooks modified triggered • IMOS-based publishers • User implements: • Commit hooks IMOS • Handler Lib Repo • Nuget package: Runtime Madari.ReliablePublisher.RSL Madari.ReliablePublisher.ServiceFabric Persist reliable tasks Retry on failure • Native Libraries • Publish Execute handler • Nuget package: Worker Handler Pick up tasks Madari.MadariFrontEnd.Native • Subscribe • Nuget package: Delete executed tasks on success Madari.Subscriber.Native
Hierarchical PubSub Infrastructure Resource Scope => PubSub Service Scope Resource Scope Publisher Subscriber CA-PA mapping regional VNet Controller Agent DataExfil policy global NRP Agent DataExfil policy Global PubSub CA->PA CA->PA CA->PA Regional Regional Regional PubSub PubSub PubSub
Global PubSub Global PubSub Replication Service Region A Region B PubSub PubSub PubSub PubSub PubSub PubSub (AZ01) (AZ02) (AZ03) (AZ01) (AZ02) (AZ03)
Publish Policy – No Replication (Sync) /DataExfil/Policies/ {policyid} 1 8 4 Routing Madari Service Service 5 /DataExfil/Policies/ 2 3 {policyid} 6 7 Selector Service Replication Remote Service Regional P/S Global PubSub 8
Madariservice/01 Replication Service Partition 1 Operation Tracking Table Op Id Status Operation Replication Details Replicationservice/01 1001 Replicated [add] /DataExfil/Policies/Policy1 {Dest1:Y, Dest2:Y, Dest3:Y } Replication Queue 1002 Replicating [update] /DataExfil/Policies/Policy1 {Dest1:Y, Dest2:N, Dest3:Y } Request to Partition 1 1003 Committed [remove] /DataExfil/Policies/Policy1 {Dest1:N, Dest2:N, Dest3:N } Destination Tracker Dest1: req1002 Dest 2: req1001 Dest 3: req1001
Global SF Ring Tenant1 n1 n2 n3 n4 n5 uswest vnet1 Tenant2 Tenant5 n1 n2 n3 n4 n5 n1 n2 n3 n4 n5 vnet2 useast vnet5 europewest Tenant3 Tenant4 n1 n2 n3 n4 n5 n1 n2 n3 n4 n5 vnet3 uswestcentral asiasoutheast vnet4
Major Performance KPIs • 15 partitions KPI Write throughput 10k req/s Read throughput 42k req/s End to End latency 10ms/300ms (50%/99%) Max subscribers 500K • In a large region: • < 300k agents • < 100K VNets • ~1k read/sec, ~200 write/sec
Work in Progress • Accelerating read flow • End to end validation
Q & A Thank you!
Recommend
More recommend