dive into streams with brooklin
play

Dive into Streams with Brooklin Celia Kung LinkedIn Background - PowerPoint PPT Presentation

Dive into Streams with Brooklin Celia Kung LinkedIn Background Scenarios Outline Application Use Cases Architecture Current and Future Background Nearline Applications Require near real-time response Thousands of applications at


  1. Dive into Streams with Brooklin Celia Kung LinkedIn

  2. Background Scenarios Outline Application Use Cases Architecture Current and Future

  3. Background

  4. Nearline Applications • Require near real-time response • Thousands of applications at LinkedIn ○ E.g. Live search indices, Notifications

  5. Nearline Applications • Require continuous, low-latency access to data ○ Data could be spread across multiple database systems • Need an easy way to move data to applications ○ App devs should focus on event processing and not on data access

  6. Heterogeneous Data Systems Microsoft EventHubs Espresso (LinkedIn’s document store)

  7. Building the Right Infrastructure Streaming System A • Build separate, specialized solutions to stream data from Streaming Microsoft EventHubs System B and to each different system? ... ... ○ Slows down development ○ Hard to manage! Streaming System C Nearline Streaming Applications System D

  8. Need a centralized, managed, and extensible service to continuously deliver data in near real-time

  9. Brooklin

  10. Brooklin • Streaming data pipeline service • Streams are dynamically provisioned and individually • Propagates data from many source configured types to many destination types • Extensible : Plug-in support for • Multitenant: Can run several additional sources/destinations thousand streams simultaneously

  11. Pluggable Sources & Destinations Databases Kafka Espresso Applications Destinations Sources Messaging Systems EventHubs Kinesis Kafka Kinesis EventHubs

  12. Scenarios

  13. Scenario 1: Change Data Capture

  14. Capturing Live Updates 1. Member updates her profile to reflect her recent job change

  15. Capturing Live Updates 2. LinkedIn wants to inform her colleagues of this change

  16. Capturing Live Updates Updates News Feed Service query Member DB

  17. Capturing Live Updates Updates News Feed Service query Member DB query Search Indices Service

  18. Capturing Live Updates Updates News Feed Service y r e u Search Indices q query Service query Notifications Member DB Service query Standardization Service ...

  19. Capturing Live Updates Updates News Feed Service y r e u Search Indices q query Service query Notifications Member DB Service query Standardization Service ...

  20. Capturing Live Updates Updates News Feed Service y r e u Search Indices q query Service query Notifications Member DB Service query Standardization Service ...

  21. Change Data Capture (CDC) • Brooklin can stream database • Isolation: Applications are decoupled updates to a change stream from the sources and don’t compete for resources with online queries • Data processing applications consume from change streams • Applications can be at different points in change timelines

  22. Change Data Capture (CDC) Updates Notifications Service Standardization Service Member DB Search Indices Service Messaging System News Feed Service

  23. Scenario 2: Streaming Bridge

  24. Stream Data from X to Y ● Across… ○ cloud services ○ clusters ○ data centers

  25. Streaming Bridge • Data pipe to move data between different environments • Enforce policy : Encryption, Obfuscation, Data formats

  26. Mirroring Kafka Data ● Aggregating data from all data centers into a centralized place ● Moving data between LinkedIn and external cloud services (e.g. Azure) ● Brooklin has replaced Kafka MirrorMaker (KMM) at LinkedIn ○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor failure isolation

  27. Use Brooklin to Mirror Kafka Data Sources Destinations Databases Databases Microsoft Microsoft EventHubs EventHubs Messaging systems Messaging systems

  28. Datacenter A Datacenter B Datacenter C tracking tracking tracking KMM KMM KMM KMM KMM KMM KMM KMM KMM Kafka aggregate aggregate aggregate ... tracking tracking tracking MirrorMaker Topology metrics metrics metrics KMM KMM KMM KMM KMM KMM KMM KMM KMM aggregate aggregate aggregate metrics metrics metrics ... ... ...

  29. Brooklin Kafka Mirroring Topology Datacenter A Datacenter B Datacenter C tracking metrics tracking metrics tracking metrics ... Brooklin Brooklin Brooklin aggregate aggregate aggregate aggregate aggregate aggregate tracking metrics tracking metrics tracking metrics

  30. Brooklin Kafka Mirroring ● Optimized for stability and operability ● Manually pause and resume mirroring at every level ○ Entire pipeline, topic, topic-partition ● Can auto-pause partitions facing mirroring issues ○ Auto-resumes the partitions afuer a configurable duration ● Flow of messages from other partitions is unaffected

  31. Application Use Cases

  32. Application Use Cases Security Cache

  33. Application Use Cases Security Search Indices Cache

  34. Application Use Cases Security Search Indices ETL or Data Cache warehouse

  35. Application Use Cases Security Search Indices ETL or Data Cache Materialized Views or warehouse Replication

  36. Application Use Cases Security Repartitioning Search Indices ETL or Data Cache Materialized Views or warehouse Replication

  37. Application Use Cases Adjunct Data

  38. Application Use Cases Bridge Adjunct Data

  39. Application Use Cases Bridge Serde, Encryption, Adjunct Data Policy

  40. Application Use Cases Standardization, Bridge Serde, Encryption, Adjunct Data Notifications … Policy

  41. Architecture

  42. Example: Stream updates made to Member Profile

  43. Capturing Live Updates Updates News Feed Member DB Service

  44. Example ● Scenario : Stream Espresso Member Profile updates into Kafka ○ Source Database : Espresso (Member DB, Profile table) ○ Destination : Kafka ○ Application : News Feed service

  45. Datastream Name : MemberProfileChangeStream • Describes the data pipeline Source : MemberDB/ProfileTable Type: Espresso • Mapping between source and Partitions: 8 destination Destination : ProfileTopic Type: Kafka • Holds the configuration for the pipeline Partitions: 8 Metadata : Application: News Feed service Owner: newsfeed@linkedin.com

  46. 1. Client makes REST call to create datastream ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer create POST /datastream News Feed Member DB Brooklin Client service

  47. 2. Create request goes to any Brooklin instance ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service

  48. 3. Datastream is written to ZooKeeper ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service

  49. 4. Leader coordinator is notified of new datastream ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service

  50. 5. Leader coordinator calculates work distribution ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service

  51. 6. Leader coordinator writes the assignments to ZK ZooKeeper Coordinator (Leader) Coordinator Coordinator Datastream Datastream Datastream Management Management Management Espresso Consumer Espresso Consumer Espresso Consumer Service (DMS) Service (DMS) Service (DMS) Kafka Producer Kafka Producer Kafka Producer Brooklin Instance Brooklin Instance Brooklin Instance Load Balancer News Feed Member DB Brooklin Client service

Recommend


More recommend