� 1 APACHE PULSAR - THE NEXT GENERATION MESSAGING AND QUEUING KARTHIK RAMASAMY SENIOR DIRECTOR OF ENGINEERING SPLUNK @KARTHIKZ
� 2 Connected World
� 3 Ubiquity of Real-Time Data Streams & Events
� 4 EVENT/STREAM DATA PROCESSING ✦ Events are analyzed and processed as they arrive ✦ Decisions are timely, contextual and based on fresh data ✦ Decision latency is eliminated ✦ Data in motion Ingest/ Act Analyze Buffer
EVENT/STREAM PROCESSING PATTERNS MONITORING MICROSERVICES WORKFLOWS ANALYTICS MODEL INFERENCE
� 6 STREAM PROCESSING PATTERN Data Ingestion Data Processing Messaging Compute Data Storage Data Storage Results Storage Serving
� 7 APACHE PULSAR Flexible Messaging + Queueing System backed by a durable log storage
Key Concepts
Core concepts: Tenants, namespaces, topics Visits Analytics Conversions Marketing Responses Campaigns Conversions Apache Pulsar Cluster Sales Data Transactions transformation Interactions Log events Data Integration Security Signatures Microservices Accesses Tenants Namespaces Topics � 9
Topics Consumers Topic Producers Consumers Producers Consumers Time � 10
Topic partitions Topic - P0 Consumers Producers Topic - P1 Consumers Producers Topic - P2 Consumers Time � 11
Segments P0 Segment 1 Segment 2 Segment 3 P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Time � 12
Architecture
� 14 APACHE PULSAR Producer Consumer SERVING Brokers can be added independently Traffic can be shifted quickly across brokers Broker Broker Broker STORAGE Bookies can be added independently Bookie Bookie Bookie New bookies will ramp up traffic quickly
� 15 APACHE PULSAR - BROKER ✦ Broker is the only point of interaction for clients (producers and consumers) ✦ Brokers acquire ownership of group of topics and “serve” them ✦ Broker has no durable state ✦ Provides service discovery mechanism for client to connect to right broker
� 16 APACHE PULSAR - BROKER
� 17 APACHE PULSAR - CONSISTENCY Bookie Bookie Producer Broker Bookie
� 18 APACHE PULSAR - DURABILITY (NO DATA LOSS) fsync Bookie Journal fsync Journal Bookie Producer Broker fsync Journal Bookie
� 19 APACHE PULSAR - ISOLATION
� 20 APACHE PULSAR - SEGMENT STORAGE … 63 62 61 60 … 43 42 41 40 … 23 22 21 20 … 4 3 2 Segment 1 Segment 2 Segment 4 Segment 3 Segment 1 Segment 3 Segment 1 Segment 2 Segment 3 Segment 2 Segment 4 Segment 4
� 21 APACHE PULSAR - RESILIENCY … 63 62 61 60 … 43 42 41 40 … 23 22 21 20 … 4 3 2 1 Segment 1 Segment 2 Segment 4 Segment 3 Segment 1 Segment 3 Segment 1 Segment 2 Segment 3 Segment 2 Segment 4 Segment 4
� 22 APACHE PULSAR - SEAMLESS CLUSTER EXPANSION … 63 62 61 60 … 43 42 41 40 … 23 22 21 20 … 4 3 2 1 Segment X Segment 1 Segment 2 Segment 4 Segment Y Segment 1 Segment 3 Segment 1 Segment 3 Segment Z Segment 4 Segment 2 Segment 3 Segment 2 Segment 4
� 23 APACHE PULSAR - TIERED STORAGE … 63 62 61 60 … 43 42 41 40 … 23 22 21 20 … 4 3 2 1 Segment 4 Segment 3 Segment 3 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4 Low Cost Storage
Multi-tiered storage and serving Partition Processing Tailing reads: served from Broker Broker Broker (brokers) in-memory cache . . . . . . . . . . . . Catch-up reads: served Warm from persistent storage Storage layer Cold Historical reads: served Storage from cold storage � 24
� 25 PARTITIONS VS SEGMENTS - WHY SHOULD YOU CARE? Logical Partition Partition View Segment 1 Segment 2 Segment 3 Segment n Broker Broker Broker Broker Broker Broker Processing Processing Partition (brokers) Partition Partition & Storage (primary) (copy) (copy) Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 . . . . . . . . . . . . Storage Segment n Segment n Segment n Segment n Legacy Architectures Apache Pulsar # Storage co-resident with processing # Partition-centric # Storage decoupled from processing # Cumbersome to scale--data # Partitions stored as segments redistribution, performance impact # Flexible, easy scalability
� 26 DEPLOYMENT IN K8S LB LB1 LB2 LB3 S S1 S2 S3 Broker Broker Broker Broker1 Broker2 Broker3 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 . . . . . . . . . . . . Segment n Segment n Segment n Segment n
� 27 PARTITIONS VS SEGMENTS - WHY SHOULD YOU CARE? ✦ In Kafka, partitions are assigned to brokers “permanently” ✦ A single partition is stored entirely in a single node ✦ Retention is limited by a single node storage capacity ✦ Failure recovery and capacity expansion require expensive “rebalancing” ✦ Rebalancing has a big impact over the system, affecting regular traffic
� 28 UNIFIED MESSAGING MODEL - STREAMING Consumer 2 X Producer 1 Pulsar topic/ Subscription Consumer 1 partition A Exclusive M4 M3 M2 M1 M0 M4 M3 M2 M1 M0 Producer 2
� 29 UNIFIED MESSAGING MODEL - STREAMING Consumer 2 In case of failure in Producer 1 consumer 1 Pulsar topic/ Subscription Consumer 1 partition B Failover M4 M3 M2 M1 M0 M4 M3 M2 M1 M0 Producer 2
� 30 UNIFIED MESSAGING MODEL - QUEUING Traffic is equally distributed across consumers Consumer 3 M0 Producer 1 M3 Pulsar topic/ Subscription M1 Consumer 2 partition C Shared M4 M4 M3 M2 M1 M0 Producer 2 M2 Consumer 1
� 31 DISASTER RECOVERY Data Center A Data Center B Producer Producer Topic (T1) Topic (T1) (P2) (P1) Simple configuration to add/remove regions Subscription Subscription Consumer Consumer (S1) (S1) Producer Asynchronous (default) Topic (T1) Integrated in the (P3) and synchronous broker message flow replication Data Center C
Asynchronous replication example Datacenter 1 Datacenter 2 Two independent clusters, • Producers Producers ZooKeeper ZooKeeper primary and standby (standby) (active) Configured tenants and • namespaces replicate to standby Pulsar Cluster Pulsar Cluster (primary) (standby) Data published to primary is • asynchronously replicated to standby Pulsar Consumers Consumers replication (standby) (active) Producers and consumers • restarted in second datacenter upon primary failure � 32
Synchronous replication example Datacenter 1 Datacenter 2 Each topic owned by one • broker at a time, i.e. in one ZooKeeper Producers Producers datacenter ZooKeeper cluster spread • Pulsar Cluster across multiple locations Broker commits writes to • bookies in both datacenters Consumers Consumers In event of datacenter failure, • broker in surviving datacenter assumes ownership of topic � 33
Replicated subscriptions Datacenter 1 Datacenter 2 Producers Subscriptions Subscriptions Pulsar Pulsar Consumers Consumers Cluster 1 Cluster 2 Pulsar Marker Marker Marker Replication � 34
� 35 MULTITENANCY - CLOUD NATIVE Topic-1 Account History Topic-2 ETL User Clustering Topic-1 Customer Authentication Microservice 5 TB Data Fraud Topic-1 Product Serving Detection Risk Classification Safety Apache Pulsar Cluster 7 TB Campaigns Marketing ✦ Authentication Topic-1 Budgeted Spend ✦ Authorization 10 TB Topic-2 ✦ Software isolation Demographic Classification ETL ๏ Storage quotas, flow control, back pressure, rate limiting ✦ Hardware isolation Topic-1 Location Resolution ๏ Constrain some tenants on a subset of brokers/bookies
� 36 PULSAR CLIENTS Python Java Go Apache Pulsar Cluster C++ C
� 37 PULSAR PRODUCER PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
� 38 PULSAR CONSUMER PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }
� 39 SCHEMA REGISTRY ✦ Provides type safety to applications built on top of Pulsar ✦ Two approaches ✦ Client side - type safety enforcement up to the application ✦ Server side - system enforces type safety and ensures that producers and consumers remain synced ✦ Schema registry enables clients to upload data schemas on a topic basis. ✦ Schemas dictate which data types are recognized as valid for that topic
Recommend
More recommend