p patterns of o streaming applications s a
play

P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 - PowerPoint PPT Presentation

P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 / 2018 @ monaldax Profile 4+ years building stream processing platform at Netflix Drove technical vision, roadmap, led implementation 17+ years building distributed


  1. P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 / 2018 @ monaldax

  2. Profile 4+ years building stream processing platform at Netflix • Drove technical vision, roadmap, led implementation • 17+ years building distributed systems • @monaldax

  3. Structure Of The Talk Stream Set The Stage 8 Patterns Processing ? 5 Functional 3 Non-Functional @monaldax

  4. Disclaimer Inspired by True Events encountered building and operating a Stream Processing platform, and use cases that are in production or in ideation phase in the cloud. Some code and identifying details have been changed, artistic liberties have been taken , to protect the privacy of streaming applications, and for sharing the know-how. Some use cases may have been simplified. @monaldax

  5. Stream Processing? Processing Data-In-Motion @monaldax

  6. Lower Latency Analytics

  7. User Activity Stream - Batched Feb 26 Feb 25 Flash Jessica Luke @monaldax

  8. Sessions - Batched User Activity Stream Feb 26 Feb 25 Flash Jessica Luke @monaldax

  9. Correct Session - Batched User Activity Stream Feb 26 Feb 25 Flash Jessica Luke @monaldax

  10. Stream Processing Natural For User Activity Stream Sessions Flash Jessica Luke @monaldax

  11. Why Stream Processing? Low latency insights and analytics 1. Process unbounded data sets 2. ETL as data arrives 3. Ad-hoc analytics and Event driven applications 4. @monaldax

  12. Set The Stage Architecture & Flink

  13. Stream Processing App Architecture Blueprint Stream Source Sink Processing Job @monaldax

  14. Stream Processing App Architecture Blueprint Side Input Source Stream Sinks Source Processing Job Source @monaldax

  15. Why Flink?

  16. Flink Programs Are Streaming Dataflows – Streams And Transformation Operators @monaldax Image adapted, source: Flink Docs

  17. Streams And Transformation Operators - Windowing 10 Second @monaldax Image source: Flink Docs

  18. Streaming Dataflow DAG @monaldax Image adapted, source: Flink Docs

  19. Scalable Automatic Scheduling Of Operations Job Manager (Process) Parallelism 2 Sink 1 (Process) (Process) @monaldax Image adapted, source: Flink Docs

  20. Flexible Deployment Containers VM / Cloud Bare Metal @monaldax

  21. Stateless Stream Processing No state maintained across events @monaldax Image adapted from: Stephan Ewen

  22. Fault-tolerant Processing – Stateful Processing In-Memory / On-Disk Local State Access Streaming Application Flink TaskManager Sink Local State Source / Checkpoints Producers Savepoints (Periodic, Asynchronous, (Explicitly Triggered) Incremental Checkpoint) @monaldax

  23. Levels Of API Abstraction In Flink Source: Flink Documentation

  24. Describing Patterns @monaldax

  25. Describing Design Patterns ● Use Case / Motivation ● Pattern ● Code Snippet & Deployment mechanism ● Related Pattern, if any @monaldax

  26. Patterns Functional

  27. 1. Configurable Router @monaldax

  28. 1.1 Use Case / Motivation – Ingest Pipelines • Create ingest pipelines for different event streams declaratively • Route events to data warehouse, data stores for analytics • With at-least-once semantics • Streaming ETL - Allow declarative filtering and projection @monaldax

  29. 1.1 Keystone Pipeline – A Self-serve Product • SERVERLESS • Turnkey – ready to use • 100% in the cloud • No code, Managed Code & Operations @monaldax

  30. 1.1 UI To Provision 1 Data Stream, A Filter, & 3 Sinks

  31. 1.1 Optional Filter & Projection (Out of the box)

  32. 1.1 Provision 1 Kafka Topic, 3 Configurable Router Jobs R Configurable Router Job Projection Filter Connector Fan-out: 3 Configurable Router Job 1 2 3 4 5 6 7 Filter Projection Connector Events Elasticsearch play_events Configurable Router Job Projection Filter Connector Consumer Kafka @monaldax

  33. 1.1 Keystone Pipeline Scale ● Up to 1 trillion new events / day ● Peak: 12M events / sec, 36 GB / sec ● ̴ 4 PB of data transported / day ● ̴ 2000 Router Jobs / 10,000 containers @monaldax

  34. 1.1 Pattern: Configurable Isolated Router Configurable Router Job Sink Declarative Processors Declarative Processors Events Producer @monaldax

  35. 1.1 Code Snippet: Configurable Isolated Router No User Code val ka#aSource = getSourceBuilder.fromKa#a( "topic1" ).build() val selectedSink = getSinkBuilder() .toSelector(sinkName).declareWith( "ka,asink" , ka#aSink) .or( "s3sink" , s3Sink).or( "essink" , esSink).or( "nullsink" , nullSink).build(); ka#aSource .filter( KeystoneFilterFunc6on ).map( KeystoneProjec6onFunc6on ) .addSink(selectedSink) @monaldax

  36. 1.2 Use Case / Motivation – Ingest large streams with high fan-out Efficiently • Popular stream / topic has high fan-out factor • Requires large Kafka Clusters, expensive R Filter TopicA Cluster1 R Events Kafka TopicB Cluster1 Projection Producer R @monaldax

  37. 1.2 Pattern: Configurable Co-Isolated Router R Filter TopicA Cluster1 Events Kafka TopicB Cluster1 Projection Producer Co-Isolated Router Merge Routing To Same Kafka Cluster @monaldax

  38. 1.2 Code Snippet: Configurable Co-Isolated Router No User Code ui_A_Clicks_KafkaSource ui_A_Clicks_KakfaSource .map(transformer) .filter(filter) .flatMap(outputFlatMap) .map(projection) .map(outputConverter) .map(outputConverter) .addSink( kafkaSinkA_Topic2 ) .addSink( kafkaSinkA_Topic1 ) @monaldax

  39. 2. Script UDF* Component [Static / Dynamic] *UDF – User Defined Function @monaldax

  40. 2. Use Case / Motivation – Configurable Business Logic Code for operations like transformations and filtering Managed Router / Streaming Job Biz Logic Source Job DAG Sink @monaldax

  41. 2. Pattern: Static or Dynamic Script UDF (stateless) Component Comes with all the Pros and Cons of scripting engine Script Engine executes function defined in the UI Streaming Job UDF Source Sink @monaldax

  42. 2. Code Snippet: Script UDF Component Contents configurable at runtime val xscript = // Script Function new DynamicConfig("x.script") val sm = new ScriptEngineManager() kakfaSource ScriptEngine se = .map(new ScriptFunc>on(xscript)) m.getEngineByName ("nashorn"); .filter(new ScriptFunc>on(xsricpt2)) se .eval(script) .addSink( new NoopSink() ) @monaldax

  43. 3. The Enricher @monaldax

  44. Next 3 Patterns (3-5) Require Explicit Deployment @monaldax

  45. 3. User Case - Generating Play Events For Personalization And Show Discovery @monaldax

  46. 3. Use-case: Create play events with current data from services, and lookup table for analytics. Using lookup table keeps originating events lightweight Streaming Job Play Logs Resource Rate Limiter Periodically updating Service call lookup data Playback Video History Service Metadata @monaldax

  47. 3. Pattern: The Enricher - Rate limit with source or service rate limiter, or with resources - Pull or push data, Sync / async Streaming Job Source Sink Source / Service Rate Limiter • Service call • Lookup from Data Store • Static or Periodically updated lookup data Side Input @monaldax

  48. 3. Code Snippet: The Enricher val kafkaSource = getSourceBuilder.fromKafka( "topic1" ).build() val parsedMessages = kafkaSource.flatMap(parser).name( ”parser" ) val enrichedSessions = parsedMessages.filter(reflushFilter).name( ”filter" ) .map(playbackEnrichment).name( ”service" ) .map(dataLookup) enrichmentSessions.addSink(sink).name( "sink" ) @monaldax

  49. 4. The Co-process Joiner @monaldax

  50. 4. Use Case – Play-Impressions Conversion Rate @monaldax

  51. 4. Impressions And Plays Scale • 130+ M members • 10+ B Impressions / day • 2.5+ B Play Events / day ~ 2 TB Processing State • @monaldax

  52. 4. Join Large Streams With Delayed, Out Of Order Events Based on Event Time • # Impressions per user play • Impression attributes leading to the play I2 impressions I1 Sink Streaming Job P1 P3 plays Kafka Topics @monaldax

  53. Understanding Event Time Input Processing 10:00 11:00 12:00 13:00 14:00 15:00 Time Output 10:00 11:00 12:00 13:00 14:00 15:00 Event Time 1 hour Window Image Adapted from The Apache Beam Presentation Material

  54. 4. Use Case: Join Impressions And Plays Stream On Event Time Keyed State Merge I2 P2 Merge I2 P2 & Emit & Emit I1 K impressions I2 K keyBy F1 keyBy F2 plays P2 K Co-process Kafka Topics Streaming Job @monaldax

  55. 4. Pattern: The Co-process Joiner Process and Coalesce events for each stream grouped by same key • Join if there is a match, evict when joined or timed out • Keyed State Source 1 keyBy F1 State 1 Sink Source 2 keyBy F2 State 2 Co-process Streaming Job @monaldax

  56. 4. Code Snippet – The Co-process Joiner, Setup sources env.setStreamTimeCharacteristic( EventTime ) val impressionSource = kafkaSrc1 .filter(eventTypeFilter) .flatMap(impressionParser) .keyBy(in => ( s"$ {profile_id} _$ {title_id} " )) val impressionSource = kafkaSrc2 .flatMap(playbackParser) .keyBy(in => ( s"$ {profile_id} _$ {title_id} " )) @monaldax

Recommend


More recommend