Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere - PowerPoint PPT Presentation

Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere @henridf Percona Live, 2018-11-06

Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years, from network to web apps via many startups. PhD in CS from EPFL Repatriate from San Francisco to Switzerland

Outline ● Kubernetes ● Prometheus ● Kubernetes metrics & sources ● Deployment

Monitor why? ● Know about outages before users tell me ● Understand my production environment (or try…) ● Plan/trend/forecast

Kubernetes

Kubernetes - Container orchestration system - aka “OS for your cluster” - Abstracts away the underlying infra - declarative APIs with control loops

https://commons.wikimedia.org/wiki/File:Kubernetes.png

Prometheus

Prometheus ❏ Started at SoundCloud in 2012 ❏ Motivated by challenges with monitoring dynamic environments ❏ Made public 2015, now second CNCF “graduate”

More than a TSDB https://prometheus.io/assets/architecture.png

It’s all about the pull - Prom scrapes targets to get metrics - Nice side effect: know when target down - Needs to know what to scrape

What should Prometheus scrape? - Service discovery provides answer - Azure, Consul, GCE, K8S, EC2, ... - Can also watch a file containing target list

Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Metric name Selector (aka filter)

Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Response : http_requests_total {code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total {code="200", method=”get”, route="/api/objects"} 1528706829.115 1920 Label/value pairs (aka dimensions)

Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Response : http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920 Timestamp value

Metadata discovery - SD also provides metadata - Metadata can be mixed in with metrics - Powerful relabelling feature for label manipulation at ingest

Instrumentation

Off-the-shelf or write your own

Kubernetes metrics

Monitoring resources and methods - For resources like memory, queues, CPUs, disks… - USE Method: Utilization, Saturation, Errors - http://www.brendangregg.com/usemethod.html - For services - “RED” Method: Request rate, Error rate, Duration - https://www.weave.works/blog/the-red-method-key-metrics-for-micr oservices-architecture/

node_exporter: node metrics - Host metrics - CPU - Memory - Disk - Network - ... - Not K8S specific, but useful as referential and for totals

cAdvisor: container metrics - Runs in kubelet (usually, for now..) - Resource stats about running containers - Mostly container and node-level labels… - (k8s: plus namespace and pod_name)

Sample cAdvisor metric queries Percent of total cluster memory used: sum(container_memory_rss) / sum(machine_memory_bytes) Memory used by kubernetes namespace: sum(container_memory_rss) by (namespace) Top 5 pods by network I/O: topk(5, sum by (pod_name) (rate(container_network_transmit_bytes_total[5m])))

Kube-state metrics $ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...

Kube-state metrics $ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: kube_deployment_spec_replicas{ deployment= "my-app" , ... } replicas: 4 ... Metrics created by kube-state-metrics With label set from this deployment status: replicas: 4 kube_deployment_status_replicas{ deployment= "my-app" , ... } ...

Sample kube-state-metrics queries Deployments with issues kube_deployment_spec_replicas != kube_deployment_status_replicas_available Top 10 longest-running pods (“reverse uptime”) topk(10, sort_desc(time() - kube_pod_created))

Kube core service metrics - API Server - etcd3 - kube-dns - scheduler, controller-manager

Metrics recap Deployment mode How many Metrics about node_exporter daemonset 1 per node node resources cAdvisor inside kubelet 1 per node container resources kube-state-metrics deployment singleton k8s object state etcd, Api Server, core service singleton or HA group Itself controller manager, ...

Deploying

Monitoring from the inside - Monitoring runs inside thing being monitored? - Yes. It’s fine really. Really, it’s fine. - (And being outside has own challenges)

Deployment outline - Metrics services - node_exporter - kube-state-metrics - (cAdvisor usually enabled out of box) - Prometheus running - Storage - Read access to API server (for service discovery) - Service discovery config for above - Service discovery config for apps/services

Helm-based install helm fetch stable/prometheus vi prometheus/values.yaml # configure install helm upgrade -i # or manually deploy yaml

Prometheus operator - Use Kubernetes API facilities to make Prometheus “native” - new Prometheus-related objects: ` kubectl get prometheus` - PrometheusRule, ServiceMonitor, AlertManager, AlertingSpec, ... - Prometheus configuration abstracted via all these objects - Young but promising - Consider more direct route first (hand-rolled or Helm), and Operator once more familiar with challenges of direct route

Thank You. Henri Dubois-Ferriere @henridf

Pointers Prometheus SD for Kubernetes: - https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config KSM metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation - Prometheus Helm chart: https://github.com/helm/charts/tree/master/stable/prometheus - Prometheus operator: https://github.com/coreos/prometheus-operator - “A deep dive into Kubernetes metrics” blog series: - https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-66936addedae

Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere - PowerPoint PPT Presentation

Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere @henridf Percona Live, 2018-11-06 Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing observability for many many years, from network to web apps via many startups.

Prometheus Best Practices and Beastly Pitfalls Julius Volz, August 17, 2017 Prometheus

Where does CoreOS fit in? Automating Monitoring infrastructure Prometheus + Kubernetes

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

@snehainguva prometheus everything, observing kubernetes in the cloud digitalocean.com about me

PromCon 2017 Welcome and Introduction Julius Volz, 17. August 2017 Prometheus Welcome and Thank

110 Rules for Prometheus Brian Brazil Founder Rule 110 110 Rules for Prometheus Brian Brazil

Practical monitoring with Prometheus and Grafana Jess Portnoy jess.portnoy@kaltura.com, Kaltura,

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

Monitoring Kubernetes with OMD Labs Edition and Prometheus Michael Kraus - FOSDEM 2017 About

Rethinking monitoring with Prometheus Martn Ferrari Based on a previous talk prepared with

What Prometheus means for monitoring vendors Jorge Salamero - @bencerillo Sysdig - PromCon 2018

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

Replacing iptables with eBPF in Kubernetes with Cilium Cilium, eBPF, Envoy, Istio, Hubble Michal

Mixin Up the ML Module System Derek Dreyer and Andreas Rossberg Max Planck Institute for

3 - Namespaces Andreas Pieris and Wolfgang Fischl, Summer term 2016 Outline The Need for

Building a Distributed Data Access Layer for Analytics on Any Cloud Bin Fan | Founding Engineer

An OpenAFS Site Report Code Name Sunrise Ralf Brunckhorst Michael Meffie (SNA) June 19, 2019

using namespace cln; 756482337867831652712019091456485669234603486104543266482133936072602491412

Architecture of the CORBA Component Model C++ Language Mapping: Data Types Requirements

Chainspace: A Sharded Smart Contracts Platform Written By: Mustafa Al-Bassam, Alberto Sonnino,