kubernetes the very hard way
play

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, - PowerPoint PPT Presentation

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail Datadog 10000s hosts in our infra Over 350 integrations 10s of k8s clusters with 50-2500 nodes Over 1,200 employees Multi-cloud Over 8,000 customers


  1. Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail

  2. Datadog 10000s hosts in our infra Over 350 integrations 10s of k8s clusters with 50-2500 nodes Over 1,200 employees Multi-cloud Over 8,000 customers Very fast growth Runs on millions of hosts Trillions of data points per day lbernail

  3. Why Kubernetes? Dogfooding Immutable Improve k8s integrations Move from Chef Multi Cloud Community Common API Large and Dynamic lbernail

  4. The very hard way?

  5. It was much harder

  6. This talk is about the fine print “Of course, you will need a HA master setup” “Oh, and yes, you will have to manage your certificates” “By the way, networking is slightly more complicated, look into CNI / ingress controllers” lbernail

  7. What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail

  8. What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail

  9. Resilient and Scalable Control Plane

  10. Kube 101 Control Plane Master etcd apiserver scheduler controllers Service in-cluster kubelet kubectl apps lbernail

  11. Making it resilient Master Master Master etcd etcd etcd apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail

  12. Kube 101 Control Plane Master etcd apiserver scheduler controllers Service in-cluster kubelet kubectl apps lbernail

  13. Separate etcd nodes etcd etcd Master Master Master apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail

  14. Single active Controller/scheduler etcd etcd Master Master Master apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail

  15. Split scheduler/controllers etcd apiserver apiserver apiserver controllers controllers Service LoadBalancer schedulers in-cluster schedulers kubelet kubectl apps lbernail

  16. Split etcd etcd etcd events apiserver apiserver apiserver controllers controllers Service LoadBalancer schedulers in-cluster schedulers kubelet kubectl apps lbernail

  17. Sizing the control plane 2x (3 or 5 nodes) disk + net ios etcd etcd events X nodes RAM + net ios apiserver apiserver apiserver 2 nodes controllers CPU controllers Service LoadBalancer 2 nodes schedulers CPU in-cluster schedulers kubelet kubectl apps lbernail

  18. What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail

  19. Kubernetes and Certificates

  20. From “the hard way” lbernail

  21. “Our cluster broke after ~1y” lbernail

  22. Certificates in Kubernetes ● Kubernetes uses certificates everywhere ● Very common source of incidents ● Our Strategy: Rotate all certificates daily lbernail

  23. Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t n e i l C d c t E apiserver lbernail

  24. Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver controllers Controller client cert scheduler Scheduler client cert kubelet Kubelet client/server cert lbernail

  25. Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster kubelet app Kubelet client/server cert lbernail

  26. Certificate management etcd PKI Peer/Server cert etcd Apiservice cert (proxy/webhooks) apiservice PKI Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster apiservice kubelet app webhook... Kubelet client/server cert lbernail

  27. Certificate management etcd PKI Peer/Server cert OIDC etcd provider Apiservice cert (proxy/webhooks) apiservice PKI Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e OIDC auth s i t p E A kubectl apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster apiservice kubelet app webhook... Kubelet client/server cert lbernail

  28. Exception ? Incident...

  29. Kubelet: TLS Bootstrap apiserver kube PKI Vault 3- Get signing key kube kv controllers 1- Create Bootstrap token admin 2- Add Bootstrap token to vault lbernail

  30. Kubelet: TLS Bootstrap 3- Verify Token and map groups apiserver kube PKI Vault kube kv controllers 2- Authenticate with token 4- Create CSR 5- Verify RBAC for CSR creator 6- Sign certificate 7- Download certificate 8- Authenticate with cert 9- Register node 1- Get Bootstrap token kubelet lbernail

  31. Kubelet certificate issue 1. One day, some Kubelets were failing to start or took 10s of minutes 2. Nothing in logs 3. Everything looked good but they could not get a cert 4. Turns out we had a lot of CSRs in flight 5. Signing controller was having a hard time evaluating them all CSR resources in the cluster Lower is better! lbernail

  32. Why? Kubelet Authentication ● Initial creation: bootstrap token, mapped to group “ system:bootstrappers ” ● Renewal: use current node certificate, mapped to group “ system:nodes “ Required RBAC permissions ● CSR creation ● CSR auto-approval CSR creation CSR auto-approval system:bootstrappers OK OK system:nodes OK lbernail

  33. Exception 2? Incident 2...

  34. Temporary solution apiserver Create webhook with self-signed cert as CA Vault Get cert and key admin webhook kube kv Add self-signed cert + key to Vault One day, after ~1 year ● Creation of resources started failing (luckily only a Custom Resource) ● Cert had expired... lbernail

  35. Take-away ● Rotate server/client certificates ● Not easy But, “If it’s hard, do it often” > no expiration issues anymore lbernail

  36. Impact of Certificate rotation

  37. Apiserver certificate rotation

  38. Impact on etcd apiserver restarts We have multiple apiservers We restart each daily etcd traffic Significant etcd network impact (caches are repopulated) etcd slow queries Significant impact on etcd performances lbernail

  39. Impact on Load-balancers apiserver restarts ELB surge queue Significant impact on LB as connections are reestablished Mitigation: increase queues on apiservers net.ipv4.tcp_max_syn_backlog net.core.somaxconn

  40. Impact on apiserver clients apiserver restarts ● Apiserver restarts ● clients reconnect and refresh their cache coredns memory usage > Memory spike for impacted apps No real mitigation today lbernail

  41. Impact on traffic balance 15MB/s 2.5MB/s Number of connections / traffic very unbalanced Because connections are very long-lived More clients => Bigger impact clusterwide 2300 connections 300 connections lbernail

  42. Why? Simple simulation Simulation for 48h ● 5 apiservers ● 10000 connections (4 x 2500 nodes) ● Every 4h, one apiserver restarts ● Reconnections evenly dispatched Cause ● Cloud TCP load-balancers use round-robin ● Long-lived connections ● No rebalancing lbernail

  43. Kubelet certificate rotation

  44. Pod graceful termination admin or apiserver controller Delete pod Stop Container with timeout “terminationGracePeriodSeconds” kubelet containerd Send SIGTERM After timeout, send SIGKILL container

  45. Restarts impact graceful termination admin or apiserver controller Delete pod kubelet containerd Send SIGTERM After timeout, or Context Cancelled send SIGKILL container Kubelet restarts end graceful termination Fixed upstream “Do not SIGKILL container if container stop is cancelled” https://github.com/containerd/cri/pull/1099

Recommend


More recommend