Managing Openstack in a cloud-native way
Marcel Haerry Alberto García • Leading the Architecture of Swisscom’s • Red Hat Cloud Architect • ElasticStack and PaaS Over 5 years helping companies to • Member of CloudFoundry’s Technical adopt emerging technologies Advisory Board • Network engineer in a previous life • Automate all the things! • Background in SystemEngineering and Software Development
Our motivation
Use Cases https://www.mycloud.ch https://developer.swisscom.com
Modern IT philosophy at Swisscom Strong and thorough CI/CD approach. Highly automated and tested before promotion through stages. rapid release cycles to iterate quickly on new features and bugfixes Promoting a devops culture through the teams High availability and scalability as you grow Building platforms for the next generation workload fault tolerant and secure deployments and lifecycle
Is it doable?
Openstack control plane • Components are decoupled: load balancer, messaging bus • State is in the database • Allows dynamic topologies: Can be scaled in/out based on control plane load due to workload usage • Control plane services can be virtualized • Openstack dedicated projects for deployment automation
The pacemaker HA approach • All in one deployment doesn’t scale as it is (rabbitmq, galera) • Big VMs doesn’t fit well in virtual environments • Life cycle of baremetal is slow • CI/CD is more complex -> How to iterate on individual components? • Clustering software is stateful • Binding control plane to infrastructure
HAProxy/Keepalived HA approach • Based on Javier Peña’s architecture https://github.com/beekhof/osp-ha-deploy/blob/master/HA-keepalived.md • Pacemaker free architecture • Distributed control plane fits well in this model • Virtualization is feasible thanks to flexibility in the services layout design • Does not bind application to infrastructure
Seems doable, let’s design it
Distributed & virtualized control plane • Pulling the pieces apart towards a distributed architecture • Horizontal scalable services (wherever possible) • virtualized control-plane • Isolate shared state (Galera & RabbitMQ)
(Double) Highly Available Architecture Component HA model Web Services HAProxy HAProxy Keepalived Mysql Galera Mongo Replica-set Rabbitmq Rabbitmq native clustering Redis Sentinel Non-API components Resiliency in the application Infrastructure Level Application Level
Modeling the components Control Plane Control Plane Compute • • Simple networking, one network Hyperconverged • for everything High density hardware • • Grouping services per major Network isolation of storage, component control & data • • Including lightweight supporting Network HA with bonding • services in the role Part of a layer 3 spine-leaf • Small sized virtual machines design • Local ephemeral storage
Lifecycle • CI/CD Framework Multiple stages to gain confidence in changes Clear separation between code and configuration • Puppet & Deployment Orchestrator for Puppet Virtual Machines & Storage described in code ScaleOut purely through API Calls
Storage • Hyperconverged compute nodes • Cinder with Scaleio scales with the amount of disks & so servers • ObjectStore completely externally (Atmos) • Glance using external S3 Backend caching of images in the control plane
distributed network services for SDN
Big picture
our journey
Active-Active HA support in Openstack components http://gorka.eguileor.com/simpler-road-to-cinder-active-active/
Bootstrapping clusters
• Monitor health • automate simple remediations NO MAGICAL RECOVERY
Benefits & drawbacks
Cloud like architecture • Control services can be treated as stateless applications • Operation of Openstack control plane similar to cloud workloads • Dynamic and agile control plane for Openstack • Cost effective solution (thanks to virtualization) • Openstack control plane does not depend on infrastructure
Cloud like day 2 operations • Measurable & scalable per component • On-boarding new services -> deploy new roles • Parallel deployment of Control Plane for upgrades • Backup only the stateful services, restage everything else • Redeployment of nodes in case of failure / problems
Drawbacks • Not fully A/A ready: Cinder-volume & Galera • RabbitMQ/MariaDB don’t scale horizontally • No magical recovery • Network partitions & keepalived • Horizon needs sticky sessions -> RRDNS does not work
Future work • OpenStack components Build services A/A from the beginning Built-in health-endpoints in services (e.g. query from HAProxy or monitoring) • Deployment Packaging deployment as containers (Kolla?!) • Architecture Decoupling storage from compute?
THANK YOU
Recommend
More recommend