P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 - PowerPoint PPT Presentation

P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 / 2018 @ monaldax

Profile 4+ years building stream processing platform at Netflix • Drove technical vision, roadmap, led implementation • 17+ years building distributed systems • @monaldax

Structure Of The Talk Stream Set The Stage 8 Patterns Processing ? 5 Functional 3 Non-Functional @monaldax

Disclaimer Inspired by True Events encountered building and operating a Stream Processing platform, and use cases that are in production or in ideation phase in the cloud. Some code and identifying details have been changed, artistic liberties have been taken , to protect the privacy of streaming applications, and for sharing the know-how. Some use cases may have been simplified. @monaldax

Stream Processing? Processing Data-In-Motion @monaldax

Lower Latency Analytics

User Activity Stream - Batched Feb 26 Feb 25 Flash Jessica Luke @monaldax

Sessions - Batched User Activity Stream Feb 26 Feb 25 Flash Jessica Luke @monaldax

Correct Session - Batched User Activity Stream Feb 26 Feb 25 Flash Jessica Luke @monaldax

Stream Processing Natural For User Activity Stream Sessions Flash Jessica Luke @monaldax

Why Stream Processing? Low latency insights and analytics 1. Process unbounded data sets 2. ETL as data arrives 3. Ad-hoc analytics and Event driven applications 4. @monaldax

Set The Stage Architecture & Flink

Stream Processing App Architecture Blueprint Stream Source Sink Processing Job @monaldax

Stream Processing App Architecture Blueprint Side Input Source Stream Sinks Source Processing Job Source @monaldax

Why Flink?

Flink Programs Are Streaming Dataflows – Streams And Transformation Operators @monaldax Image adapted, source: Flink Docs

Streams And Transformation Operators - Windowing 10 Second @monaldax Image source: Flink Docs

Streaming Dataflow DAG @monaldax Image adapted, source: Flink Docs

Scalable Automatic Scheduling Of Operations Job Manager (Process) Parallelism 2 Sink 1 (Process) (Process) @monaldax Image adapted, source: Flink Docs

Flexible Deployment Containers VM / Cloud Bare Metal @monaldax

Stateless Stream Processing No state maintained across events @monaldax Image adapted from: Stephan Ewen

Fault-tolerant Processing – Stateful Processing In-Memory / On-Disk Local State Access Streaming Application Flink TaskManager Sink Local State Source / Checkpoints Producers Savepoints (Periodic, Asynchronous, (Explicitly Triggered) Incremental Checkpoint) @monaldax

Levels Of API Abstraction In Flink Source: Flink Documentation

Describing Patterns @monaldax

Describing Design Patterns ● Use Case / Motivation ● Pattern ● Code Snippet & Deployment mechanism ● Related Pattern, if any @monaldax

Patterns Functional

1. Configurable Router @monaldax

1.1 Use Case / Motivation – Ingest Pipelines • Create ingest pipelines for different event streams declaratively • Route events to data warehouse, data stores for analytics • With at-least-once semantics • Streaming ETL - Allow declarative filtering and projection @monaldax

1.1 Keystone Pipeline – A Self-serve Product • SERVERLESS • Turnkey – ready to use • 100% in the cloud • No code, Managed Code & Operations @monaldax

1.1 UI To Provision 1 Data Stream, A Filter, & 3 Sinks

1.1 Optional Filter & Projection (Out of the box)

1.1 Provision 1 Kafka Topic, 3 Configurable Router Jobs R Configurable Router Job Projection Filter Connector Fan-out: 3 Configurable Router Job 1 2 3 4 5 6 7 Filter Projection Connector Events Elasticsearch play_events Configurable Router Job Projection Filter Connector Consumer Kafka @monaldax

1.1 Keystone Pipeline Scale ● Up to 1 trillion new events / day ● Peak: 12M events / sec, 36 GB / sec ● ̴ 4 PB of data transported / day ● ̴ 2000 Router Jobs / 10,000 containers @monaldax

1.1 Pattern: Configurable Isolated Router Configurable Router Job Sink Declarative Processors Declarative Processors Events Producer @monaldax

1.1 Code Snippet: Configurable Isolated Router No User Code val ka#aSource = getSourceBuilder.fromKa#a( "topic1" ).build() val selectedSink = getSinkBuilder() .toSelector(sinkName).declareWith( "ka,asink" , ka#aSink) .or( "s3sink" , s3Sink).or( "essink" , esSink).or( "nullsink" , nullSink).build(); ka#aSource .filter( KeystoneFilterFunc6on ).map( KeystoneProjec6onFunc6on ) .addSink(selectedSink) @monaldax

1.2 Use Case / Motivation – Ingest large streams with high fan-out Efficiently • Popular stream / topic has high fan-out factor • Requires large Kafka Clusters, expensive R Filter TopicA Cluster1 R Events Kafka TopicB Cluster1 Projection Producer R @monaldax

1.2 Pattern: Configurable Co-Isolated Router R Filter TopicA Cluster1 Events Kafka TopicB Cluster1 Projection Producer Co-Isolated Router Merge Routing To Same Kafka Cluster @monaldax

1.2 Code Snippet: Configurable Co-Isolated Router No User Code ui_A_Clicks_KafkaSource ui_A_Clicks_KakfaSource .map(transformer) .filter(filter) .flatMap(outputFlatMap) .map(projection) .map(outputConverter) .map(outputConverter) .addSink( kafkaSinkA_Topic2 ) .addSink( kafkaSinkA_Topic1 ) @monaldax

2. Script UDF* Component [Static / Dynamic] *UDF – User Defined Function @monaldax

2. Use Case / Motivation – Configurable Business Logic Code for operations like transformations and filtering Managed Router / Streaming Job Biz Logic Source Job DAG Sink @monaldax

2. Pattern: Static or Dynamic Script UDF (stateless) Component Comes with all the Pros and Cons of scripting engine Script Engine executes function defined in the UI Streaming Job UDF Source Sink @monaldax

2. Code Snippet: Script UDF Component Contents configurable at runtime val xscript = // Script Function new DynamicConfig("x.script") val sm = new ScriptEngineManager() kakfaSource ScriptEngine se = .map(new ScriptFunc>on(xscript)) m.getEngineByName ("nashorn"); .filter(new ScriptFunc>on(xsricpt2)) se .eval(script) .addSink( new NoopSink() ) @monaldax

3. The Enricher @monaldax

Next 3 Patterns (3-5) Require Explicit Deployment @monaldax

3. User Case - Generating Play Events For Personalization And Show Discovery @monaldax

3. Use-case: Create play events with current data from services, and lookup table for analytics. Using lookup table keeps originating events lightweight Streaming Job Play Logs Resource Rate Limiter Periodically updating Service call lookup data Playback Video History Service Metadata @monaldax

3. Pattern: The Enricher - Rate limit with source or service rate limiter, or with resources - Pull or push data, Sync / async Streaming Job Source Sink Source / Service Rate Limiter • Service call • Lookup from Data Store • Static or Periodically updated lookup data Side Input @monaldax

3. Code Snippet: The Enricher val kafkaSource = getSourceBuilder.fromKafka( "topic1" ).build() val parsedMessages = kafkaSource.flatMap(parser).name( ”parser" ) val enrichedSessions = parsedMessages.filter(reflushFilter).name( ”filter" ) .map(playbackEnrichment).name( ”service" ) .map(dataLookup) enrichmentSessions.addSink(sink).name( "sink" ) @monaldax

4. The Co-process Joiner @monaldax

4. Use Case – Play-Impressions Conversion Rate @monaldax

4. Impressions And Plays Scale • 130+ M members • 10+ B Impressions / day • 2.5+ B Play Events / day ~ 2 TB Processing State • @monaldax

4. Join Large Streams With Delayed, Out Of Order Events Based on Event Time • # Impressions per user play • Impression attributes leading to the play I2 impressions I1 Sink Streaming Job P1 P3 plays Kafka Topics @monaldax

Understanding Event Time Input Processing 10:00 11:00 12:00 13:00 14:00 15:00 Time Output 10:00 11:00 12:00 13:00 14:00 15:00 Event Time 1 hour Window Image Adapted from The Apache Beam Presentation Material

4. Use Case: Join Impressions And Plays Stream On Event Time Keyed State Merge I2 P2 Merge I2 P2 & Emit & Emit I1 K impressions I2 K keyBy F1 keyBy F2 plays P2 K Co-process Kafka Topics Streaming Job @monaldax

4. Pattern: The Co-process Joiner Process and Coalesce events for each stream grouped by same key • Join if there is a match, evict when joined or timed out • Keyed State Source 1 keyBy F1 State 1 Sink Source 2 keyBy F2 State 2 Co-process Streaming Job @monaldax

4. Code Snippet – The Co-process Joiner, Setup sources env.setStreamTimeCharacteristic( EventTime ) val impressionSource = kafkaSrc1 .filter(eventTypeFilter) .flatMap(impressionParser) .keyBy(in => ( s"$ {profile_id} _$ {title_id} " )) val impressionSource = kafkaSrc2 .flatMap(playbackParser) .keyBy(in => ( s"$ {profile_id} _$ {title_id} " )) @monaldax

P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 - PowerPoint PPT Presentation

P Patterns Of O Streaming Applications S A Monal Daxini 11/ 6 / 2018 @ monaldax Profile 4+ years building stream processing platform at Netflix Drove technical vision, roadmap, led implementation 17+ years building distributed

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Design Patterns Applications Programming What is design patterns? The design patterns are

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

Software, Faster Patterns of Effective Delivery Dan North @tastapod Patterns of Effective

Design Patterns in Eiffel Dr. Till Bay design patterns? [Design Patterns] are

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

More Design Patterns Horstmann ch.10.1,10.4 Design patterns Structural design patterns

Introduction (1) Packet Loss Recovery for Streaming is growing Commercial streaming

Massive-scale analysis of streaming social networks David A. Bader Exascale Streaming Data

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Streaming Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Motivation Streaming

Landell - live streaming for the masses Luciana Fujii Pontello Landell - live streaming for the

BGP made easy John van Oppen Wave / Spectrum AS11404 2 What is BGP? Snarky answer: RFC-4271

ON THE AUTOMATIC GENERATION OF RECURSIVE ATTITUDE DETERMINATION ALGORITHMS Presented at the AAS

Benefits of Advanced Resource Recovery at the Primary Treatment Stage How to Participate Today

Contra-directional Couplers for Wavelength Selective Switches Aaron Wissing Electrical

PERFORMANCE OPTIMIZATION IN RED PERFORMANCE OPTIMIZATION IN RED HAT OPENSTACK PLATFORM HAT

SHRSD Grades 7-11 1:1 Chromebook Program Parent Chromebook Night Curricular-Technological

Be Web Smart! Web Smart Workshop, Sanborn PTO February 13, 2013 Jean Dumais www.bewebsmart.com

The Energy Sciences Network BESAC August 2004 William E. Johnston, ESnet Dept. Head and Mary

Sambuz

Useful Links

Newsletter

Mail Us