The Best of Apache Kafka Architecture Ranganathan Balashanmugam - PowerPoint PPT Presentation

Apache: Big Data 2015 The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than

Helló Budapest

About Me Graduated as Civil Engineer. ❏ <dev> 10+ years </dev> ❏ <Thoughtworker from=”India”/> ❏ Organizer of Hyderabad Scalability Meetup with 2000+ ❏ members.

“Form follows function.” - Louis Sullivan

Gravity Dam Indirasagar Dam, India img src: http://www.montanhydraulik.in

Forces on a gravity dam Dam Head Water weight Tail Water Uplift

publish-subscribe messaging service ❏ distributed commit/write-ahead log ❏ “producers produce, consumers consume, in large distributed reliable way -- real time”

Why Kafka? DBs ❏ Logs ❏ Brokers ❏ HDFS ❏ “For highly distributed messages, Kafka stands out.”

Kafka Vs ________ src: https://softwaremill.com/mqperf/

Timeline Open sourced by LinkedIn, as version 0.6 Graduated from Apache Several Engineers who built Kakfa create Confluent Latest stable - 0.8.2.1 2011 2012 2013 2014 2015

A Kafka Message key key CRC magic attributes message length message content length message kafka.message.Message Change requested:KAFKA-2511

Producers - push Request => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]] Kafka Broker Response => [TopicName [Partition ErrorCode Offset]] org.apache.kafka.clients.producer.KafkaProducer

Topic Remove messages based on number of time size messages kafka.common.Topic

Partitions Serves: Horizontal scaling, Parallel consumer reads kafka.cluster.Partition

Consumers - pull Consumer 2 Consumer 1 kafka.consumer.ConsumerConnector, kafka.consumer.SimpleConsumer

Consumer offsets committing and fetching consumer offsets img src: http://www.reynanprinting.com/photos/undefined/impresion-offset1.jpg

kafka:// - protocol “Binary protocol over TCP” Metadata ● Send ● Fetch ● Offsets ● Offset commit ● Offset fetch ●

Mechanical Sympathy "The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Peteroski Image source: http://www.theguide2surrey.com

Persistence “Everything is faster till the disk IO.”

Disk faster than RAM src: http://queue.acm.org/detail.cfm?id=1563874

Linear Read & Writes On high level there are only two operations: fetch messages from a Append to end of log partition beginning from a particular message id sequential file I/O

“Let us play pictionary”

Linux Page Cache “Kafka ate my RAM”

ZeroCopy src: http://www.ibm.com/developerworks/library/j-zerocopy/

Batching small latency to improve throughput img src: https://prashanthpanduranga.files.wordpress.com/2015/05/tirupati.jpg

Compression bandwidth is more expensive per-byte to scale than disk I/O, CPU, or network bandwidth capacity within a facility kafka.message.CompressionCodec

Log compaction kafka.log.LogCleaner, LogCleanerManager img src: http://kafka.apache.org/083/documentation.html

Message Delivery Atleast once Atmost once Exactly once

Replication un-replicated = replication factor of one

Quorum based Better latency ● To tolerate “f” failures, need “2f+1” replicas ●

Primary-backup replication Topic 1 Topic 1 Topic 1 Topic 2 Topic 2 Topic 2 Topic 3 Topic 3 Topic 3 Broker 1 Broker 2 Broker 3 Broker 4

ZooKeeper cluster coordinator

THANK YOU For questions or suggestions: Ran.ga.na.than B ranganab@thoughtworks.com @ran_than

The Best of Apache Kafka Architecture Ranganathan Balashanmugam - PowerPoint PPT Presentation

Apache: Big Data 2015 The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Hell Budapest About Me Graduated as Civil Engineer. <dev> 10+ years </dev> <Thoughtworker from=India/>

Blockchain consensus Protocols in the Wild Tao Wang, Lihang Pan ECS 265 Apache Kafka

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer,

How-to for real-time streaming and analytics at scale with Apache Kafka and Apache Ignite Viktor

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Being Ready for Apache Kafka: Todays Ecosystem and Future Roadmap Michael G. Noll @miguno

I Logs Apache Kafka, Stream Processing, and Real-time Data Jay Kreps The Plan 1. What is Data

Cloud Native Data Pipelines with Apache Kafka Gwen Shapira, Software Engineer @gwenshap 2

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Cloud-Native and Scalable Kafka Allen Wang @allenxwang About Me Real Time Data

real-time alerting, analytics and reporting at scale with Apache Kafka and Apache Ignite

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Anatomy of an Apache OpenOffice Extension Pedro Giffuni pfg@apache.org AOO Architecture: Jigsaw

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of

Keep your Data Close and your Caches Hotter using Apache Kafka, Connect and KSQL @gamussa |

CS5412 / LECTURE 20 Ken Birman & Kishore APACHE ARCHITECTURE Pusukuri, Spring 2019

Evolution of an Apache Spark Nick Afshartous Architecture for

Cr Cruise Co Control: Effo l: Effortle less M Manage gement o of K f Kafka fka Clu

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o

Journey to a Real-Time Enterprise Neha Narkhede, Co-founder/CTO at Confluent, Co-Creator Apache

Architecture recovery of Apache 1.3 A case study Bernhard Grne, Andreas Knpfel, Rudolf

Architecture of Flink's Streaming Runtime Robert Metzger @rmetzger_ rmetzger@apache.org What

The Best of Apache Kafka Architecture Ranganathan Balashanmugam - PowerPoint PPT Presentation

Apache: Big Data 2015 The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Hell Budapest About Me Graduated as Civil Engineer. <dev> 10+ years </dev> <Thoughtworker from=India/>

Blockchain consensus Protocols in the Wild Tao Wang, Lihang Pan ECS 265 Apache Kafka

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer,

How-to for real-time streaming and analytics at scale with Apache Kafka and Apache Ignite Viktor

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA &amp; APACHE SAMZA Processing billions of events

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Being Ready for Apache Kafka: Todays Ecosystem and Future Roadmap Michael G. Noll @miguno

I Logs Apache Kafka, Stream Processing, and Real-time Data Jay Kreps The Plan 1. What is Data

Cloud Native Data Pipelines with Apache Kafka Gwen Shapira, Software Engineer @gwenshap 2

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Cloud-Native and Scalable Kafka Allen Wang @allenxwang About Me Real Time Data

real-time alerting, analytics and reporting at scale with Apache Kafka and Apache Ignite

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Anatomy of an Apache OpenOffice Extension Pedro Giffuni pfg@apache.org AOO Architecture: Jigsaw

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of

Keep your Data Close and your Caches Hotter using Apache Kafka, Connect and KSQL @gamussa |

CS5412 / LECTURE 20 Ken Birman &amp; Kishore APACHE ARCHITECTURE Pusukuri, Spring 2019

Evolution of an Apache Spark Nick Afshartous Architecture for

Cr Cruise Co Control: Effo l: Effortle less M Manage gement o of K f Kafka fka Clu

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o

Journey to a Real-Time Enterprise Neha Narkhede, Co-founder/CTO at Confluent, Co-Creator Apache

Architecture recovery of Apache 1.3 A case study Bernhard Grne, Andreas Knpfel, Rudolf

Architecture of Flink's Streaming Runtime Robert Metzger @rmetzger_ rmetzger@apache.org What

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events

CS5412 / LECTURE 20 Ken Birman & Kishore APACHE ARCHITECTURE Pusukuri, Spring 2019