monitoring kubernetes with omd labs edition and prometheus
play

Monitoring Kubernetes with OMD Labs Edition and Prometheus - PowerPoint PPT Presentation

Monitoring Kubernetes with OMD Labs Edition and Prometheus Michael Kraus - FOSDEM 2017 About me Doing monitoring for 12 years, mainly with plain old Nagios, open-source only. About me Michael Kraus Senior Monitoring Consultant @ ConSol.


  1. Monitoring Kubernetes with OMD Labs Edition and Prometheus Michael Kraus - FOSDEM 2017

  2. About me

  3. Doing monitoring for 12 years, mainly with plain old Nagios, open-source only. About me Michael Kraus Senior Monitoring Consultant @ ConSol.

  4. Background

  5. Implementation of Kubernetes PoC at $customer : Why We have … already running some ● Kubernetes in a monitoring instances there. classical enterprise but no idea about ● monitoring Kubernetes.

  6. Natural choice for kubernetes monitoring: Integrated service ● With discovery Labels are retained ● Enter Prometheus between Kubernetes and Prometheus

  7. There are excellent tutorials and blog posts available as a starting point, for example by coreos.com/blog/ ● How ( Fabian Reinartz ) robustperception.io/blog/ ● Where to start ( Brian Brazil ) … many examples on ● GitHub

  8. Implementation

  9. ● kubernetes_sd_configs - role: endpoints ● kubernetes_sd_configs - role: node ● Implementation kubernetes_sd_configs - role: pod Prometheus kubernetes_sd prometheus-kubernetes.yml from prometheus/examples.

  10. Metrics: ● apiserver_* ● container_cpu_* Implementation ● container_fs_* ● deployment_* Prometheus ● etcd_* kubernetes_sd ● kubelet_* ● ...

  11. Prometheus exporter for hardware and OS metrics exposed by the kernel. Implementation node_exporter ● DaemonSet ● prometheus.io/scrape: 'true'

  12. Metrics: ● node_cpu ● node_disk_* Implementation ● node_filesystem_* ● node_netstat_* node_exporter ● node_vmstat_* ● ...

  13. “... focused … on the health of the various objects inside, such as deployments, nodes and pods.” Implementation kube-state-metrics ● prometheus.io/scrape: 'true'

  14. Metrics: ● kube_deployment_* Implementation ● kube_node_* ● kube_pod_* kube-state-metrics ● kube_resource_quota ● ...

  15. Based on minikube: github.com/ kubernetes/minikube Implementation Demo environment Sample config: github.com/ m-kraus/kubernetes-monitoring

  16. Demo

  17. What we also need: persistent storage ● Implementation Alertmanager ● Grafana ● What else? Pushgateway ● ... ●

  18. But we have that already

  19. Monitoring in one package. completely open-source ● Classical based on Nagios / Icinga ● bundles “best practices” of monitoring ● many years of experience OMD Labs Edition no root required ● "Musterlösung" at $customer for monitoring projects:

  20. Nagios Icinga1 Icinga2 Shinken Naemon Thruk Mod-Gearman PNP4Nagios LMD NagVis Apache MySQL InfluxDB Nagflux Classical Prometheus Dokuwiki monitoring Grafana FreeTDS JMX4Perl check_webinject check_logfiles OMD Labs Edition Jolokia check_mysql_health coshsh check_mssql_health rrdcache check_nsc_web check_curly check_nwc_health check_multi check_oracle_health Ansible

  21. omd create <MYSITE> Classical omd cp <PROD> <STAGE> monitoring OMD sites and commads omd update <STAGE> omd version

  22. omd create <MYSITE> omd cp <PROD> <STAGE>

  23. Classical monitoring https://labs.consol.de/omd/ OMD Labs Edition

  24. Why not scrape Kubernetes directly from OMD: hard to access pods inside ● Implementation Kubernetes hard to access API from ● Connecting OMD outside Kubernetes API secured via TLS and ● token only (easily) available from a serviceaccount

  25. Getting the metrics from Kubernetes to OMD: federation ● Implementation - job_name: 'kube_federation' Connecting OMD metrics_path: '/federate' honor_labels: true params: 'match[]': -'{job=~"^kubernetes.+"}'

  26. OMD

  27. Demo

  28. Issues

  29. “... Not quite the purpose of federation.” Brian Brazil Issues www.robustperception.io/ federation-what-is-it-good-for/ Federation Let’s try it anyway ... ●

  30. "Accessing metrics without authentication is ok for a PoC, but not allowed in production..." Issues internal audit Securing How to secure (federated) ● Prometheus?

  31. "Should Nagios, Alertmanager or both notify?" “Do we need to define our checks and alerts both, in Issues Nagios and Prometheus?” Integration How to route alerts ● How to ease or centralize ● configuration

  32. “How can we store (some) of our graphs for a longer period of time?” Issues Long-term storage InfluxDB ? ●

  33. “Our kubernetes cluster died. We had no monitoring until it up again...” operations team Issues external monitoring of ● Coverage crucial components machine health ○ important services ○ important API queries ○

  34. Thanks for watching

Recommend


More recommend