Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail
Datadog 10000s hosts in our infra Over 350 integrations 10s of k8s clusters with 50-2500 nodes Over 1,200 employees Multi-cloud Over 8,000 customers Very fast growth Runs on millions of hosts Trillions of data points per day lbernail
Why Kubernetes? Dogfooding Immutable Improve k8s integrations Move from Chef Multi Cloud Community Common API Large and Dynamic lbernail
The very hard way?
It was much harder
This talk is about the fine print “Of course, you will need a HA master setup” “Oh, and yes, you will have to manage your certificates” “By the way, networking is slightly more complicated, look into CNI / ingress controllers” lbernail
What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail
What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail
Resilient and Scalable Control Plane
Kube 101 Control Plane Master etcd apiserver scheduler controllers Service in-cluster kubelet kubectl apps lbernail
Making it resilient Master Master Master etcd etcd etcd apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail
Kube 101 Control Plane Master etcd apiserver scheduler controllers Service in-cluster kubelet kubectl apps lbernail
Separate etcd nodes etcd etcd Master Master Master apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail
Single active Controller/scheduler etcd etcd Master Master Master apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail
Split scheduler/controllers etcd apiserver apiserver apiserver controllers controllers Service LoadBalancer schedulers in-cluster schedulers kubelet kubectl apps lbernail
Split etcd etcd etcd events apiserver apiserver apiserver controllers controllers Service LoadBalancer schedulers in-cluster schedulers kubelet kubectl apps lbernail
Sizing the control plane 2x (3 or 5 nodes) disk + net ios etcd etcd events X nodes RAM + net ios apiserver apiserver apiserver 2 nodes controllers CPU controllers Service LoadBalancer 2 nodes schedulers CPU in-cluster schedulers kubelet kubectl apps lbernail
What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail
Kubernetes and Certificates
From “the hard way” lbernail
“Our cluster broke after ~1y” lbernail
Certificates in Kubernetes ● Kubernetes uses certificates everywhere ● Very common source of incidents ● Our Strategy: Rotate all certificates daily lbernail
Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t n e i l C d c t E apiserver lbernail
Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver controllers Controller client cert scheduler Scheduler client cert kubelet Kubelet client/server cert lbernail
Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster kubelet app Kubelet client/server cert lbernail
Certificate management etcd PKI Peer/Server cert etcd Apiservice cert (proxy/webhooks) apiservice PKI Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster apiservice kubelet app webhook... Kubelet client/server cert lbernail
Certificate management etcd PKI Peer/Server cert OIDC etcd provider Apiservice cert (proxy/webhooks) apiservice PKI Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e OIDC auth s i t p E A kubectl apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster apiservice kubelet app webhook... Kubelet client/server cert lbernail
Exception ? Incident...
Kubelet: TLS Bootstrap apiserver kube PKI Vault 3- Get signing key kube kv controllers 1- Create Bootstrap token admin 2- Add Bootstrap token to vault lbernail
Kubelet: TLS Bootstrap 3- Verify Token and map groups apiserver kube PKI Vault kube kv controllers 2- Authenticate with token 4- Create CSR 5- Verify RBAC for CSR creator 6- Sign certificate 7- Download certificate 8- Authenticate with cert 9- Register node 1- Get Bootstrap token kubelet lbernail
Kubelet certificate issue 1. One day, some Kubelets were failing to start or took 10s of minutes 2. Nothing in logs 3. Everything looked good but they could not get a cert 4. Turns out we had a lot of CSRs in flight 5. Signing controller was having a hard time evaluating them all CSR resources in the cluster Lower is better! lbernail
Why? Kubelet Authentication ● Initial creation: bootstrap token, mapped to group “ system:bootstrappers ” ● Renewal: use current node certificate, mapped to group “ system:nodes “ Required RBAC permissions ● CSR creation ● CSR auto-approval CSR creation CSR auto-approval system:bootstrappers OK OK system:nodes OK lbernail
Exception 2? Incident 2...
Temporary solution apiserver Create webhook with self-signed cert as CA Vault Get cert and key admin webhook kube kv Add self-signed cert + key to Vault One day, after ~1 year ● Creation of resources started failing (luckily only a Custom Resource) ● Cert had expired... lbernail
Take-away ● Rotate server/client certificates ● Not easy But, “If it’s hard, do it often” > no expiration issues anymore lbernail
Impact of Certificate rotation
Apiserver certificate rotation
Impact on etcd apiserver restarts We have multiple apiservers We restart each daily etcd traffic Significant etcd network impact (caches are repopulated) etcd slow queries Significant impact on etcd performances lbernail
Impact on Load-balancers apiserver restarts ELB surge queue Significant impact on LB as connections are reestablished Mitigation: increase queues on apiservers net.ipv4.tcp_max_syn_backlog net.core.somaxconn
Impact on apiserver clients apiserver restarts ● Apiserver restarts ● clients reconnect and refresh their cache coredns memory usage > Memory spike for impacted apps No real mitigation today lbernail
Impact on traffic balance 15MB/s 2.5MB/s Number of connections / traffic very unbalanced Because connections are very long-lived More clients => Bigger impact clusterwide 2300 connections 300 connections lbernail
Why? Simple simulation Simulation for 48h ● 5 apiservers ● 10000 connections (4 x 2500 nodes) ● Every 4h, one apiserver restarts ● Reconnections evenly dispatched Cause ● Cloud TCP load-balancers use round-robin ● Long-lived connections ● No rebalancing lbernail
Kubelet certificate rotation
Pod graceful termination admin or apiserver controller Delete pod Stop Container with timeout “terminationGracePeriodSeconds” kubelet containerd Send SIGTERM After timeout, send SIGKILL container
Restarts impact graceful termination admin or apiserver controller Delete pod kubelet containerd Send SIGTERM After timeout, or Context Cancelled send SIGKILL container Kubelet restarts end graceful termination Fixed upstream “Do not SIGKILL container if container stop is cancelled” https://github.com/containerd/cri/pull/1099
Recommend
More recommend