Dive into Streams with Brooklin Celia Kung LinkedIn
Background Scenarios Outline Application Use Cases Architecture Current and Future
Background
Nearline Applications • Require near real-time response • Thousands of applications at LinkedIn ○ E.g. Live search indices, Notifications
Nearline Applications • Require continuous, low-latency access to data ○ Data could be spread across multiple database systems • Need an easy way to move data to applications ○ App devs should focus on event processing and not on data access
Heterogeneous Data Systems Microsoft EventHubs Espresso (LinkedIn’s document store)
Building the Right Infrastructure Streaming System A • Build separate, specialized solutions to stream data from Streaming Microsoft EventHubs System B and to each different system? ... ... ○ Slows down development ○ Hard to manage! Streaming System C Nearline Streaming Applications System D
Need a centralized, managed, and extensible service to continuously deliver data in near real-time
Brooklin
Brooklin • Streaming data pipeline service • Streams are dynamically provisioned and individually • Propagates data from many source configured types to many destination types • Extensible : Plug-in support for • Multitenant: Can run several additional sources/destinations thousand streams simultaneously
Pluggable Sources & Destinations Databases Kafka Espresso Applications Destinations Sources Messaging Systems EventHubs Kinesis Kafka Kinesis EventHubs
Scenarios
Scenario 1: Change Data Capture
Capturing Live Updates 1. Member updates her profile to reflect her recent job change
Capturing Live Updates 2. LinkedIn wants to inform her colleagues of this change
Capturing Live Updates Updates News Feed Service query Member DB
Capturing Live Updates Updates News Feed Service query Member DB query Search Indices Service
Capturing Live Updates Updates News Feed Service y r e u Search Indices q query Service query Notifications Member DB Service query Standardization Service ...
Capturing Live Updates Updates News Feed Service y r e u Search Indices q query Service query Notifications Member DB Service query Standardization Service ...
Capturing Live Updates Updates News Feed Service y r e u Search Indices q query Service query Notifications Member DB Service query Standardization Service ...
Change Data Capture (CDC) • Brooklin can stream database • Isolation: Applications are decoupled updates to a change stream from the sources and don’t compete for resources with online queries • Data processing applications consume from change streams • Applications can be at different points in change timelines
Change Data Capture (CDC) Updates Notifications Service Standardization Service Member DB Search Indices Service Messaging System News Feed Service
Scenario 2: Streaming Bridge
Stream Data from X to Y ● Across… ○ cloud services ○ clusters ○ data centers
Streaming Bridge • Data pipe to move data between different environments • Enforce policy : Encryption, Obfuscation, Data formats
Mirroring Kafka Data ● Aggregating data from all data centers into a centralized place ● Moving data between LinkedIn and external cloud services (e.g. Azure) ● Brooklin has replaced Kafka MirrorMaker (KMM) at LinkedIn ○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor failure isolation
Use Brooklin to Mirror Kafka Data Sources Destinations Databases Databases Microsoft Microsoft EventHubs EventHubs Messaging systems Messaging systems
Datacenter A Datacenter B Datacenter C tracking tracking tracking KMM KMM KMM KMM KMM KMM KMM KMM KMM Kafka aggregate aggregate aggregate ... tracking tracking tracking MirrorMaker Topology metrics metrics metrics KMM KMM KMM KMM KMM KMM KMM KMM KMM aggregate aggregate aggregate metrics metrics metrics ... ... ...
Brooklin Kafka Mirroring Topology Datacenter A Datacenter B Datacenter C tracking metrics tracking metrics tracking metrics ... Brooklin Brooklin Brooklin aggregate aggregate aggregate aggregate aggregate aggregate tracking metrics tracking metrics tracking metrics
Brooklin Kafka Mirroring ● Optimized for stability and operability ● Manually pause and resume mirroring at every level ○ Entire pipeline, topic, topic-partition ● Can auto-pause partitions facing mirroring issues ○ Auto-resumes the partitions afuer a configurable duration ● Flow of messages from other partitions is unaffected
Application Use Cases
Application Use Cases Security Cache
Application Use Cases Security Search Indices Cache
Application Use Cases Security Search Indices ETL or Data Cache warehouse
Application Use Cases Security Search Indices ETL or Data Cache Materialized Views or warehouse Replication
Application Use Cases Security Repartitioning Search Indices ETL or Data Cache Materialized Views or warehouse Replication
Application Use Cases Adjunct Data
Application Use Cases Bridge Adjunct Data
Application Use Cases Bridge Serde, Encryption, Adjunct Data Policy
Application Use Cases Standardization, Bridge Serde, Encryption, Adjunct Data Notifications … Policy
Architecture
Example: Stream updates made to Member Profile
Capturing Live Updates Updates News Feed Member DB Service
Example ● Scenario : Stream Espresso Member Profile updates into Kafka ○ Source Database : Espresso (Member DB, Profile table) ○ Destination : Kafka ○ Application : News Feed service
Datastream Name : MemberProfileChangeStream • Describes the data pipeline Source : MemberDB/ProfileTable Type: Espresso • Mapping between source and Partitions: 8 destination Destination : ProfileTopic Type: Kafka • Holds the configuration for the pipeline Partitions: 8 Metadata : Application: News Feed service Owner: newsfeed@linkedin.com
1. Client makes REST call to create datastream ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer create POST /datastream News Feed Member DB Brooklin Client service
2. Create request goes to any Brooklin instance ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service
3. Datastream is written to ZooKeeper ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service
4. Leader coordinator is notified of new datastream ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service
5. Leader coordinator calculates work distribution ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service
6. Leader coordinator writes the assignments to ZK ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service
Recommend
More recommend