Running Kafka on Kubernetes with Strimzi Sean Glover, Lightbend @seg1o
Who am I? I’m Sean Glover Principal Engineer at Lightbend • Member of the Lightbend Pipelines team • Organizer of Scala Toronto (scalator) • Author and contributor to various projects in the Kafka • ecosystem including Kafka, Alpakka Kafka (reactive-kafka), Strimzi, Kafka Lag Exporter, DC/OS Commons SDK / seg1o https://seanglover.com/ sean@seanglover.com @seg1o 3
Operations Is Hard “Technology will make our lives easier” Technology makes running other technology easier Automate as much operations work as we can Designed by Freepik @seg1o 4
Motivating Example: Zero-downtime Kafka Upgrade
Motivating Example: Upgrading Kafka High level steps to upgrade Kafka Rolling update to explicitly define broker properties 1. and log.message.format.version inter.broker.protocol.version Download new Kafka distribution and perform rolling upgrade 1 broker at a time 2. Rolling update to upgrade inter.broker.protocol.version to new version 3. Upgrade Kafka clients 4. Rolling update to upgrade log.message.format.version to new version 5. @seg1o 7
Motivating Example: Upgrading Kafka Any update to the Kafka cluster must be performed in a serial “rolling update”. The complete Kafka upgrade process requires 3 “rolling updates” Each broker update requires Secure login • Configuration linting - Any change to a broker requires a rolling broker update • Graceful shutdown - Send SIGINT signal to broker • Broker initialization - Wait for Broker to join cluster and signal it’s ready • This operation is error-prone to do manually and difficult to model declaratively using generalized infrastructure automation tools. @seg1o 8
Automation “If it hurts, do it more frequently, and bring the pain forward.” Jez Humble, Continuous Delivery - @seg1o 9
Automation of Operations Upgrading Kafka is just one of many complex operational concerns. For example) Initial deployment • Manage ZooKeeper • Replacing brokers • Topic partition rebalancing • Decommissioning or adding brokers • How do we automate complex operational workflows in a reliable way? @seg1o 10
Container Orchestrated Clusters
Cluster Resource Managers @seg1o 12
Task Isolation with Containers Linux Containers (LXC) Cluster Resource Manager’s use Linux Containers to • constrain resources and provide isolation cgroups constrain resources • Container Container Container User space Namespaces isolate file system/process trees • Docker is just a project to describe and share containers • Cluster Resource Container Engine Manager efficiently (others: rkt, LXC, Mesos) Containers are available for several platforms • Kernel space Namespaces cgroups Modules Linux Kernel Drivers Physical or Virtual Machine Jail Linux Container Windows Container @seg1o 13
Kubernetes and the Operator Pattern
@seg1o 15
The Operator Pattern 1. Controller/Operator 2. Configuration State “Kafka” Custom Resource // Active Reconciliation Loop “Kafka” Custom Resource for { watches CRUD changes apiVersion: kafka.strimzi.io/v1alpha1 desired := getDesiredState() kind: Kafka apiVersion: kafka.strimzi.io/v1alpha1 current := getCurrentState() metadata: kind: Kafka name: simple-strimzi metadata: makeChanges(desired, current) spec: name: simple-strimzi } kafka: spec: config: kafka: ... config: ... deploy reconciliation plan Kafka Cluster @seg1o 16
Stateful Services in Kubernetes StatefulSet name: kafka-brokers StatefulSet ’s • Stable pod & network identity Pod Stable persistent storage • name: kafka-brokers-0 • Ordered deployment and updates • Ordered graceful deletion and termination • Ordered automated rolling updates. PersistentVolumeClaim name: data-kafka-brokers-0 PersistentVolume name: pvc-2a4f8bcb-45cd @seg1o 17
Abstracting Persistence PersistentVolumeClaim StorageClass name: data-kafka-brokers-0 name: aws-ebs size: 10GB provisioner: kubernetes.io/aws-ebs storage class: aws-ebs PersistentVolume name: pvc-2a4f8bcb-45cd Provisioner AWS EBS Volume (aws-ebs) @seg1o 18
Strimzi An operator-based Kafka on Kubernetes project
Strimzi Strimzi is an open source operator-based Apache Kafka project for Kubernetes and OpenShifu Announced Feb 25th, 2018 • Evolved from non-operator project known as • Barnabas by Paolo Patierno, Red Hat Part of Red Hat Developer Program • “Streams” component of Red Hat AMQ, a • commercial product of messaging technologies by Red Hat @seg1o 20
Cluster Operator watches “Kafka” CRD Cluster Operator deploys Kafka StatefulSet ZooKeeper StatefulSet Broker Pod ZK Pod Broker Pod Broker Pod Entity Operators (User and Topic Operator) Demo: ./resources/simple-strimzi.yaml @seg1o 21
Entity Operator (User and Topic Operators) Entity Operators “KafkaTopic” CRD Topic Operator watches “KafkaUser” CRD User Operator synchronizes with Kafka and ZooKeeper StatefulSets Demo: ./resources/simple-topic.yaml @seg1o 22
Strimzi Storage Modes 1. Ephemeral 2. Persistent 2 (b). Persistent JBOD Broker Pod Broker Pod Broker Pod emptyDir PersistentVolume PV PV PV Volume (PV) transient persistent persistent Broker config log.dirs = [PV1, PV2, PV3] @seg1o 23
Operational Concerns
Install Strimzi Installation and running a Strimzi Kafka cluster is a two step process. Install the Strimzi Helm Chart 1. Create a Kafka Kubernetes resource 2. Helm Chart Install: helm repo add strimzi http://strimzi.io/charts/ helm install strimzi/strimzi-kafka-operator Demo: ./demo/01-create-simple-strimzi-cluster.sh @seg1o 25
Connecting Clients Fully qualified service hostname: simple-strimzi-kafka-bootstrap.strimzi.svc.cluster.local:9092 “Plain” 9092 Kafka resource Broker load Namespace K8s Service TLS 9093 metadata.name balancer name Interbroker 9094 Prometheus 9404 Demo: ./demo/02-connecting-clients.sh run-kafka-perf-producer.sh @seg1o 27
Rolling Configuration Updates Rolling Configuration Process 1. Watched Kafka resource change 2. Apply new config to Kafka StatefulSet spec 3. Starting from pod 0, delete the pod and allow the StatefulSet to recreate it 4. Kafka pod will generate new broker.config 5. Kafka is started 6. Wait until the readiness check is good. 7. Repeat from step 3 for the next pod Demo: ./demo/03-broker-config-update.sh @seg1o 28
Scaling Brokers Up Increase replica count spec.kafka.replicas 1. Reassign partitions: ./bin/kafka-reassign-partitions.sh 2. kafka-0 P0 P1 P2 kafka-0 kakfa-1 kafka-2 P0 P1 P2 Demo: ./demo/04-scale-brokers.sh ./partition-reassignment/generate-plan-output.json @seg1o 29
Rolling Broker Upgrades Rolling Broker Upgrade Process: Upgrade Strimzi Cluster Operator 1. Update config: 2. (Optional) Set log.message.format.version broker config a. Set desired Kafka release version b. Rolling Updates (1-2x) (Optional) Upgrade clients using cluster 3. (Optional) Set log.message.format.version broker config 4. Rolling Update (0-1x) @seg1o 30
Broker Replacement & Movement Replacing brokers is common with large busy clusters $ kubectl delete pod kafka-1 Broker replacement also useful to facilitate broker movement across the cluster Research the max bitrate per partition for your cluster 1. Move partitions from broker to replace 2. Replace broker 3. Rebalance/move partitions to new broker 4. @seg1o 31
Broker Replacement & Movement Research the max bitrate per partition for your cluster 1. Run a controlled test Bitrate depends on message size, producer batch, and consumer fetch size • Create a standalone cluster with 1 broker, 1 topic, and 1 partition • Run producer and consumer perf tests using average message/client properties • Measure broker metric for average bitrate • kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec @seg1o 32
Broker Replacement & Movement Move partitions from broker to replace 2. Broker 0 Broker 1 Broker 2 P P P P P P P P P P P P Use Kafka partition reassignment tool Generate an assignment plan without old broker 1 • Pick a fraction of the measured max bitrate found in step 1 • Broker 0 Broker 1 Broker 2 (Ex. 75%, 80%) Apply plan with bitrate throttle • P P P P Wait till complete • P P P P P P P P kafka-reassign-partitions … --topics-to-move-json-file topics.json --broker-list "0,2" --generate kafka-reassign-partitions … --reassignment-json-file reassignment.json --execute --throttle 10000000 kafka-reassign-partitions … --topics-to-move-json-file topics.json --reassignment-json-file reassignment.json --verify @seg1o 33
Broker Replacement & Movement X Replace broker 3. Broker 0 Broker 1 Broker 2 P P P P Replace broker pod instance with kubectl P P P P P P P P $ kubectl delete pod kafka-1 Broker 1 Old broker 1 instance is shutdown and resources deallocated • Deploy plan provisions a new broker 1 instance • New broker 1 is assigned same id as old broker 1: 1 • @seg1o 34
Recommend
More recommend