Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere @henridf Percona Live, 2018-11-06
Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years, from network to web apps via many startups. PhD in CS from EPFL Repatriate from San Francisco to Switzerland
Outline ● Kubernetes ● Prometheus ● Kubernetes metrics & sources ● Deployment
Monitor why? ● Know about outages before users tell me ● Understand my production environment (or try…) ● Plan/trend/forecast
Kubernetes
Kubernetes - Container orchestration system - aka “OS for your cluster” - Abstracts away the underlying infra - declarative APIs with control loops
https://commons.wikimedia.org/wiki/File:Kubernetes.png
Prometheus
Prometheus ❏ Started at SoundCloud in 2012 ❏ Motivated by challenges with monitoring dynamic environments ❏ Made public 2015, now second CNCF “graduate”
More than a TSDB https://prometheus.io/assets/architecture.png
It’s all about the pull - Prom scrapes targets to get metrics - Nice side effect: know when target down - Needs to know what to scrape
What should Prometheus scrape? - Service discovery provides answer - Azure, Consul, GCE, K8S, EC2, ... - Can also watch a file containing target list
Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Metric name Selector (aka filter)
Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Response : http_requests_total {code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total {code="200", method=”get”, route="/api/objects"} 1528706829.115 1920 Label/value pairs (aka dimensions)
Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Response : http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920 Timestamp value
Metadata discovery - SD also provides metadata - Metadata can be mixed in with metrics - Powerful relabelling feature for label manipulation at ingest
Instrumentation
Off-the-shelf or write your own
Kubernetes metrics
Monitoring resources and methods - For resources like memory, queues, CPUs, disks… - USE Method: Utilization, Saturation, Errors - http://www.brendangregg.com/usemethod.html - For services - “RED” Method: Request rate, Error rate, Duration - https://www.weave.works/blog/the-red-method-key-metrics-for-micr oservices-architecture/
node_exporter: node metrics - Host metrics - CPU - Memory - Disk - Network - ... - Not K8S specific, but useful as referential and for totals
cAdvisor: container metrics - Runs in kubelet (usually, for now..) - Resource stats about running containers - Mostly container and node-level labels… - (k8s: plus namespace and pod_name)
Sample cAdvisor metric queries Percent of total cluster memory used: sum(container_memory_rss) / sum(machine_memory_bytes) Memory used by kubernetes namespace: sum(container_memory_rss) by (namespace) Top 5 pods by network I/O: topk(5, sum by (pod_name) (rate(container_network_transmit_bytes_total[5m])))
Kube-state metrics $ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...
Kube-state metrics $ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: kube_deployment_spec_replicas{ deployment= "my-app" , ... } replicas: 4 ... Metrics created by kube-state-metrics With label set from this deployment status: replicas: 4 kube_deployment_status_replicas{ deployment= "my-app" , ... } ...
Sample kube-state-metrics queries Deployments with issues kube_deployment_spec_replicas != kube_deployment_status_replicas_available Top 10 longest-running pods (“reverse uptime”) topk(10, sort_desc(time() - kube_pod_created))
Kube core service metrics - API Server - etcd3 - kube-dns - scheduler, controller-manager
Metrics recap Deployment mode How many Metrics about node_exporter daemonset 1 per node node resources cAdvisor inside kubelet 1 per node container resources kube-state-metrics deployment singleton k8s object state etcd, Api Server, core service singleton or HA group Itself controller manager, ...
Deploying
Monitoring from the inside - Monitoring runs inside thing being monitored? - Yes. It’s fine really. Really, it’s fine. - (And being outside has own challenges)
Deployment outline - Metrics services - node_exporter - kube-state-metrics - (cAdvisor usually enabled out of box) - Prometheus running - Storage - Read access to API server (for service discovery) - Service discovery config for above - Service discovery config for apps/services
Helm-based install helm fetch stable/prometheus vi prometheus/values.yaml # configure install helm upgrade -i # or manually deploy yaml
Prometheus operator - Use Kubernetes API facilities to make Prometheus “native” - new Prometheus-related objects: ` kubectl get prometheus` - PrometheusRule, ServiceMonitor, AlertManager, AlertingSpec, ... - Prometheus configuration abstracted via all these objects - Young but promising - Consider more direct route first (hand-rolled or Helm), and Operator once more familiar with challenges of direct route
Thank You. Henri Dubois-Ferriere @henridf
Pointers Prometheus SD for Kubernetes: - https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config KSM metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation - Prometheus Helm chart: https://github.com/helm/charts/tree/master/stable/prometheus - Prometheus operator: https://github.com/coreos/prometheus-operator - “A deep dive into Kubernetes metrics” blog series: - https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-66936addedae
Recommend
More recommend