Monitoring Kubernetes with OMD Labs Edition and Prometheus Michael Kraus - FOSDEM 2017
About me
Doing monitoring for 12 years, mainly with plain old Nagios, open-source only. About me Michael Kraus Senior Monitoring Consultant @ ConSol.
Background
Implementation of Kubernetes PoC at $customer : Why We have … already running some ● Kubernetes in a monitoring instances there. classical enterprise but no idea about ● monitoring Kubernetes.
Natural choice for kubernetes monitoring: Integrated service ● With discovery Labels are retained ● Enter Prometheus between Kubernetes and Prometheus
There are excellent tutorials and blog posts available as a starting point, for example by coreos.com/blog/ ● How ( Fabian Reinartz ) robustperception.io/blog/ ● Where to start ( Brian Brazil ) … many examples on ● GitHub
Implementation
● kubernetes_sd_configs - role: endpoints ● kubernetes_sd_configs - role: node ● Implementation kubernetes_sd_configs - role: pod Prometheus kubernetes_sd prometheus-kubernetes.yml from prometheus/examples.
Metrics: ● apiserver_* ● container_cpu_* Implementation ● container_fs_* ● deployment_* Prometheus ● etcd_* kubernetes_sd ● kubelet_* ● ...
Prometheus exporter for hardware and OS metrics exposed by the kernel. Implementation node_exporter ● DaemonSet ● prometheus.io/scrape: 'true'
Metrics: ● node_cpu ● node_disk_* Implementation ● node_filesystem_* ● node_netstat_* node_exporter ● node_vmstat_* ● ...
“... focused … on the health of the various objects inside, such as deployments, nodes and pods.” Implementation kube-state-metrics ● prometheus.io/scrape: 'true'
Metrics: ● kube_deployment_* Implementation ● kube_node_* ● kube_pod_* kube-state-metrics ● kube_resource_quota ● ...
Based on minikube: github.com/ kubernetes/minikube Implementation Demo environment Sample config: github.com/ m-kraus/kubernetes-monitoring
Demo
What we also need: persistent storage ● Implementation Alertmanager ● Grafana ● What else? Pushgateway ● ... ●
But we have that already
Monitoring in one package. completely open-source ● Classical based on Nagios / Icinga ● bundles “best practices” of monitoring ● many years of experience OMD Labs Edition no root required ● "Musterlösung" at $customer for monitoring projects:
Nagios Icinga1 Icinga2 Shinken Naemon Thruk Mod-Gearman PNP4Nagios LMD NagVis Apache MySQL InfluxDB Nagflux Classical Prometheus Dokuwiki monitoring Grafana FreeTDS JMX4Perl check_webinject check_logfiles OMD Labs Edition Jolokia check_mysql_health coshsh check_mssql_health rrdcache check_nsc_web check_curly check_nwc_health check_multi check_oracle_health Ansible
omd create <MYSITE> Classical omd cp <PROD> <STAGE> monitoring OMD sites and commads omd update <STAGE> omd version
omd create <MYSITE> omd cp <PROD> <STAGE>
Classical monitoring https://labs.consol.de/omd/ OMD Labs Edition
Why not scrape Kubernetes directly from OMD: hard to access pods inside ● Implementation Kubernetes hard to access API from ● Connecting OMD outside Kubernetes API secured via TLS and ● token only (easily) available from a serviceaccount
Getting the metrics from Kubernetes to OMD: federation ● Implementation - job_name: 'kube_federation' Connecting OMD metrics_path: '/federate' honor_labels: true params: 'match[]': -'{job=~"^kubernetes.+"}'
OMD
Demo
Issues
“... Not quite the purpose of federation.” Brian Brazil Issues www.robustperception.io/ federation-what-is-it-good-for/ Federation Let’s try it anyway ... ●
"Accessing metrics without authentication is ok for a PoC, but not allowed in production..." Issues internal audit Securing How to secure (federated) ● Prometheus?
"Should Nagios, Alertmanager or both notify?" “Do we need to define our checks and alerts both, in Issues Nagios and Prometheus?” Integration How to route alerts ● How to ease or centralize ● configuration
“How can we store (some) of our graphs for a longer period of time?” Issues Long-term storage InfluxDB ? ●
“Our kubernetes cluster died. We had no monitoring until it up again...” operations team Issues external monitoring of ● Coverage crucial components machine health ○ important services ○ important API queries ○
Thanks for watching
Recommend
More recommend