Containers Infrastructure for Advanced Management Federico Simoncelli Associate Manager, Red Hat October 2016
About Me
Kubernetes ● Decoupling problems to hand out to different teams ○ Developers do operations for their application ○ Cluster Admins do operations for cluster software ○ Kernel and Operating System do operations for nodes ○ Hardware operations for clouds ● Layer of abstraction for Application definition ● Machines don’t have an identity or a specific function ○ “All ...machines are created equal” ● Developers do not know about Operators issues ● Operators do not know about Applications issues
OpenShift ● 100% based and compatible with Kubernetes ● Kubernetes influencer for new features ○ Projects and Namespaces ○ Templates ○ Routes and Ingress ● Additional features related to images life-cycle and rolling updates ● Integrated experience in many areas ○ Opinionated metrics and logging solutions ○ Developer Web Console
Application Components Distribution Traditional and Kubernetes distribution of application components
New Set Of (Old) Problems for Operators SCALE COMPLEXITY One developer . Dev team . Dev meets Ops . DevOps . Production Ops. How do I How can we How do we run at Can we turn it into How do we containerize? move faster? scale? a platform? manage at scale?
Deployment Requirements ● Standardized and easy to reproduce ○ Pick a platform Atomic vs Traditional ● Automatic and composable ● Deploy-and-forget is not enough ● Maintainable ○ Definition of desired state and reconciliation ● Allow to reliably modify infrastructure ○ Scaling (add and remove nodes) ○ Change configurations, etc. ● Somehow similar to Kubernetes principles
Deployment Status ● Kubernetes ○ kube-up based on SaltStack (turning into kube-deploy) ■ Mostly for GCE (and Vagrant for development) ○ Kargo based on Ansible ○ GKE (possible future) ● OpenShift ○ https://github.com/openshift/openshift-ansible ○ Supports AWS GCE libvirt OpenStack Vagrant ● Containers on OpenStack ○ Kubernetes and OpenShift Heat templates ○ Magnum container orchestration as first class resources ○ https://github.com/redhat-openstack/openshift-on-openstack
OpenShift-Ansible ● Actively maintained and feature-rich ● Based on a healthy Open Source automation project ○ Large ecosystem ○ Composable with other automations ● Describe your infrastructure as “inventory” ○ Inventory can be versioned and updated ● Simple interactive installation ○ atomic-openshift-installer ● Advanced installation supporting many advanced features ○ Possibly hard to master
Monitoring Objectives ● Notification of incidents ○ Grace period ○ Notifications ● Debug new or unknown issues ○ Quickly have at hand the overall status of the cluster ○ Easy access to metrics and logging ■ Metrics and logging at all levels (infrastructure, etc.) ● Analyze trending and proactively avoid future incidents ○ Scheduled maintenance ○ Datacenter Hardware upgrades
Common Monitoring Architecture
Monitor Kubernetes-Based Clusters with Heapster ● Leverage the infrastructure to monitor the same infrastructure ○ What if monitoring is failing continuously? ● Heapster ○ Enables Container Cluster Monitoring and Performance Analysis ○ Different sinks ● Autoscaling ○ Collected data are then used to autoscale Pods (when configured)
Agile Monitoring ● Running continuously a data center 24/7 demands more than Metrics collection ● Contribution to Heapster and cAdvisor is “slow” ● Integrate additional solutions and technologies ● Agile addition of new Metrics ○ No development involved ● Monitoring for known issues ○ Nodes can self-heal ● Statistics on most recurring issues ○ Identify fragile components or architecture ○ Focus development for reliability
Application and Infrastructure Monitoring ● Roles and duties separation (once again) ○ Developers should be interested only on metrics and logs of applications ■ Developers must see only data of objects they own ○ Operators are mostly interested on metrics and logs of the infrastructure (e.g. nodes) ● Metrics, logging and alerts belong to objects ○ Heapster collects metrics per object (node, container, etc.) ● Security considerations ○ Applications and infrastructure in the same data store? ○ Tenancy in data store is enough for you?
Monitoring Architecture Considerations ● Reliability and disruptions isolation ● Scalability of each subsystem ● Data locality ● Reuse of existing solutions ● Security (and isolation of data) ● Monitoring life-cycle (upgrade and rollback) ● Cross correlation of multiple clusters and solutions ● Single technology for Metrics and Logging?
Direct Monitoring
Metrics and Logging Federation
Hawkular and ElasticSearch ● Open Source solutions for metrics and logging Hawkular based on Cassandra ○ ElasticSearch based on Lucene ○ ● Data stores used by many existing projects ● Technologies of choice for OpenShift ○ Work out of the box in OpenShift ● Hawkular trigger definitions for Alerts ● Kibana visualization tool for ElasticSearch
Image and Security Security assessment ● How to trust underlying images? ● How to keep the images safe ● How to enforce security policies? Technologies ● Signed images ● OpenSCAP assessment tools ● Atomic Scan and Blackduck
Putting It All Together ● Maintainable deployment solution ○ Support cluster re-shaping ○ Versionable ● Monitoring unexpected events and alerts ● Planning data center evolution over time ● Ability of monitoring and cross-link with the underlying infrastructure ● Out-Of-The-Box experience ○ Knowledge gathered from a community of Operators
ManageIQ Comprehensive Cloud Management ● Single-Pane of Glass ○ Monitoring ○ Management ● Private and Public All-Around ○ VMs, Instances, Containers, Storage, Network Management Framework ● ○ Infrastructure applications ● Policies and Alerts ● Reports and Chargeback Reports ● Automation ● Capacity Planning
ManageIQ Project and History ● Virtualization Management since 2006 ● Acquired by Red Hat in December 2012 ● Open-Sourced in June 2014 7 Technical Leaders 3 Monthly Stable Builds ~50 Core Engineers Nightly Builds ~100 Contributors (and counting) 3 Weeks Sprints 3 Companies Involved 200 Average PR (per Sprint)
Introducing Containers to ManageIQ 2015 - 2016 ● Inventory collection of major objects ○ Nodes, Pods, Services, Replicators, etc. ● Cross-linking for nodes on known instances ● Dashboard and Topology ● Metrics collection from Hawkular ○ Utilization aggregation (Project, Service, etc.) ● Smart-State Analysis ○ Collection of images packages ● OpenSCAP for container images ● Policies for container objects ● Chargeback
ManageIQ Inventory and Relationships Service Pod Container Image Cluster Node Instance
Containers Management in ManageIQ in 2017 Current ongoing efforts for 2017 ● Alerts dashboard and life-cycle ● Live Metrics and Alerts ○ Metrics served by Hawkular to ManageIQ ○ Support native Hawkular triggers for Alerts ● Dynamic Metrics and Alerts ○ Custom metrics and alerts on-demand ● Automation ○ Manage and re-provision ManageIQ using Ansible ● Integration with Logging and ELK stack
Get Involved! ● Community http://talk.manageiq.org ● Code https://github.com/ManageIQ/manageiq providers/containers ● Documentation http://manageiq.org/documentation ● Social: ○ Twitter @manageiq #manageiq Federico Simoncelli fsimonce@redhat.com https://twitter.com/simon3z
Recommend
More recommend