MULTI-SITE OPENSTACK DEPLOYMENT OPTIONS & CHALLENGES FOR TELCOS Azhar Sayeed Chief Architect asayeed@redhat.com
DISCLAIMER Important Informa+on The informa+on described in this slide set does not provide any commitments to roadmaps or availability of products or features. Its inten+on is purely to provide clarity in describing the problem and drive a discussion that can then be used to drive open source communi+es Red Hat Product Management owns the roadmap and supportability conversa+on for any Red Hat product 2
AGENDA Background: OpenStack Architecture • Telco Deployment Use case • Distributed deployment – requirements • Multi-Site Architecture • Challenges • Solution and Further Study • Conclusions • 3
OPENSTACK ARCHITECTURE
WHY MULTI-SITE FOR TELCO? Compute requirements – Not just at Data Center • • Mul+ple Data Centers • Managed Service Offering • Managed Branch Office • Thick vCPE • Mobile Edge Compute vRAN – vBBU loca+ons • • Virtualized Central Offices • Hundreds to thousands of loca+ons Primary and Backup Data Center – Disaster recovery • IoT Gateways – Fog compu+ng • Centrally managed Compute closer to the user 6
Multiple DC or Central Offices Independent OpenStack Deployments Remote Sites E2E Orchestrator • Hierarchical Connec+vity model of CO • Remote sites with compute requirements Security & Firewall • Extend OpenStack to these sites Quality of Service (QoS) Traffic Shaping Overlay Device Management Tunnel over Internet Main Data Center A typical service almost always Backup Data Center spans across mul3ple DCs Remote Data Centers 7
Multiple DCs – NFV Deployment Real Customer Requirements Region 2 . L2 or L3 Extensions . between DCs . . . Fully Redundant System . Region 1 25 25 Sites • 2-5 VNFs required at each site • Maximum of 2 Compute Nodes per site needed for these • VNFs Controllers Storage Requirements = Image storage only • Redundant Storage Nodes Total number of control Nodes = 25 *3 =75 • Compute Nodes Configura+on Overhead Total Number of Storage Nodes = 25 * 3 = 75 • 75% Total Number of Compute Nodes = 25 * 2 = 50 • 8
Virtual Central Office Real Customer Challenge Region Region 1 2 . L2 or L3 . Extensions between DCs . . . Fully Redundant . System 1000+ 1000+ Sites – Central Offices • From few 10s to 100s of VMs • Fully Redundant configura+ons • Termina+on of Residen+al, Business and Mobile Services • Controllers Managing 1000 openstack islands • Storage Nodes Tier 1 Telcos already have >100 sites today • Compute Nodes Management Challenge 9
DEPLOYMENT OPTIONS 10
OPTIONS Mul+ple Independent Island Model – seen this already • Common Authen+ca+on and Management • – External user policy management with LDAP integra+on – Common Keystone Stretched deployment model • – Extend compute and Storage Nodes into other Data Centers – Keep central control of all remote resources Allow Data Centers to share workloads – Tri-circle approach • Proxy the APIs – Master Slave model or cascading model • Agent based model • Something else?? • 11
Multiple DC or Central Offices Independent OpenStack Deployments Feed the load balancer • Site capacity independent of the other Cloud Management Pladorm • User informa+on separate or replicated offline • Load balancer directs traffic where to L go to – Good for loadsharing B • DR – external problem Directory L2 or L3 Extensions between DCs Fully Redundant Fully Redundant System System Controllers Storage Nodes Region 1 Compute Nodes Region 2 … N Good for few 10s of sites – What about 100s or Thousands of sites 12
Extended OpenStack Model Shared Keystone Deployment Common or Shared Keystone Cloud Management Pladorm • Single Keystone for authen+ca+on • User informa+on in one loca+on • Independent Resources • Modify the keystone endpoint table Keystone • Endpoint, Service, Region, IP Directory L2 or L3 Extensions between DCs Fully Redundant Fully Redundant System System Controllers Storage Nodes Region 1 Compute Nodes Region 2 … N Iden+ty: Keystone – Single point of control 13
Extended OpenStack Model Central Controller and Remote Compute & Storage (HCI) Nodes Central Controller Cloud Management Pladorm Single authen+ca+on • Distributed Compute Resources • Single Availability Zone per Region • L2 or L3 Region 1 Region 2 … N Extensions between DCs Fully Redundant System Controllers Replicated Storage – Storage Nodes Galera Cluster Compute Nodes Cinder, Glance and Image Directory 14 Manual Restore
Revisiting the Branch Office - Thick CPE Can we deploy compute nodes at all the branch sites and centrally control them? E2E Network Orchestrator Data Center Enterprise Security & Firewall vCPE Quality of Service (QoS) Traffic Shaping Device Management IPSec, MPLS Internet or Other Tunnel mechanism Enterprise vCPE x86 Server with VNFs Deploy Nova Compute NFVI How do I scale it to thousands of sites? OpenStack, OpenShift/ 15 Kubernetes
OSP 10 – Scale components independently Most OpenStack HA services and VIPs must be launched/managed by Pacemaker or HAProxy . However, some can be managed via systemctl thanks to the simplification of pacemaker constraints introduced in version 9 and 10.
COMPOSABLE SERVICES AND CUSTOM ROLES Hardcoded Custom Custom Custom Controller Role Controller Role Ceilometer Role Networker Role Keystone Keystone Ceilometer Ceilometer Neutron Neutron RabbitMQ RabbitMQ Glance Glance ... ... • Leverage composable services model – to define a Central Keystone – Place functionality where it is needed – i.e. dis-aggregate • Deployable standalone on separate nodes or combined with other services into Custom Role(s). 17 – Distribute the functionality depending on the DC locations
Re-visiting the Virtual Central Office use case Real Customer Challenge Region 1 L2 or L3 Region 2 Extensions between DCs Fully Redundant Region 4 System Region 3 Controllers Storage Nodes Compute Nodes Region 3a Region 3b Require Flexibility and some Hierarchy 18
CONSIDERATIONS Scaling across a thousand sites? Some areas that we need to look at • Latency and Outage times • • Delays due to distance between DCs and link speeds - RTT • The remote site is lost – headless operations and subsequent recovery • Startup Storms Scaling Oslo messaging • • RabbitMQ • Scaling of Nodes => Scale RabbitMQ/Messaging • Ceilometer (Gnocchi & Aodh) – heavy user of MQ 19
LATENCY AND OUTAGE TIMES Scaling across a thousand sites? Latency between sites – Nova API Calls • • 10, 50, 100 ms? Round trip +me = Queue tuning • Bojleneck link/node speed Outage +me – recovery +me • • 30s or more? Nova Compute services flapping • Confirma+on – from provisioning to opera+on • Neutron +me outs – binding issues • Headless opera+on • Restart – causes storms • 20
RABBITMQ TUNING Tune the buffers – increase buffer size • Take into account messages in flight – rates and round trip +mes • • BDP = Bojleneck speed * RTT • Number of messages • Servers * backends * requests/sec = Number of messages/sec Neutron • Split into mul+ple instances of message queues for distributed deployment Ceilometer into a MQ – Heaviest user of MQ • MQ • Nova into a single MQ MQ • Neutron into a MQ Nova Conductor Refer to an interes+ng presenta+on on this topic – “Tuning RabbitMQ • MQ Compute at Large Scale Cloud” – Openstack Summit – Aus+n 2016 Ceilometer Ceilometer Agents collector 21
RECENT AMQP ENHANCEMENTS Eliminates the broker based model • Broker Enhances AMQP 1.0 • Broker Broker • Separate messaging end point from message routers Hierarchical - Tree • Newton has AMQP driver for oslo messaging • Ocata provides perf tuning, upstream support for Triple-O If you must use RabbitMQ • • Use clustering and exchange configurations Broker Broker • Use shovel plugin with exchange configurations and multiple instances Mesh - Routed 22
OPENSTACK CASCADING PROJECT Parent Child Child Child Proxy for Nova, Cinder, Celometer & Neutron subsystems per site Parent AZn At Parent – loads of proxys one set per Child AZ1 Child User communicates to the master 23
TRICIRCLE AND TRIO2O Cascading solution split into two projects User1 UserN Tricircle – Networking across openstack clouds • Trio2o – Single API Gateway for Nova, Cinder • TRI-CIRCLE Trio2o API Gateway Make Neutron(s) work as a single cluster AZn AZx pod AZ1 Expand workloads into other OS instances Single Region with mul+ple sub regions Create Networking extensions Shared or Federated Keystone Isola+on of East-west traffic Shared or Distributed Glance Applica+on HA UID = TenantID+PODID OPNFV Mul+-Site Project – Eupherates release 24
Recommend
More recommend