Introduction to Stream Processing Guido Schmutz Frankfurt - 21.2.2019 @gschmutz guidoschmutz.wordpress.com BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH @gschmutz
Agenda 1. Motivation for Stream Processing? 2. Capabilities for Stream Processing 3. Implementing Stream Processing Solutions 4. Demo 5. Summary Introduction to Stream Processing @gschmutz
Guido Schmutz Working at Trivadis for more than 22 years Oracle Groundbreaker Ambassador & Oracle ACE Director Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com 145 th edition Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Introduction to Stream Processing @gschmutz
Motivation for Stream Processing? Introduction to Stream Processing @gschmutz
Big Data solves Volume and Variety – not Velocity Enterprise Data high latency Warehouse t r Bulk Source o p x E L Hadoop Clusterd Q S Hadoop Cluster File Big Data Platform DB Refined BI Tools Extract SQL File Import / SQL Import Results Storage DB Raw Search Parallel Search / Explore Processing Storage Service Enterprise Apps { } API Logic Introduction to Stream Processing @gschmutz
Big Data solves Volume and Variety – not Velocity Enterprise Data high latency Warehouse t r Bulk Source o p x E L Hadoop Clusterd Q S Hadoop Cluster File Big Data Platform DB Refined BI Tools Extract SQL File Import / SQL Import Results Storage DB Raw Search Parallel Event Source Search / Explore Processing Storage Mobile Apps Service IoT Data Event Stream Enterprise Apps Location { } Social API Logic Telemetry Introduction to Stream Processing @gschmutz
Big Data solves Volume and Variety – not Velocity Enterprise Data Warehouse t r Bulk Source o p x E L Hadoop Clusterd Q S Hadoop Cluster File Big Data Platform DB Refined BI Tools Extract SQL File Import / SQL Import Results Storage DB Raw Search high latency Parallel Event Source Search / Explore Processing Storage Mobile Event Event Apps Event Hub Machine Learning • Service Hub Hub Graph Algorithms IoT • Data Event Stream Natural Language Processing • Enterprise Apps Location { } Social API Logic Telemetry Introduction to Stream Processing @gschmutz
"Data at Rest" vs. "Data in Motion" Data at Rest Data in Motion Store Act Act Analyze Analyze 11101 11101 01010 01010 Store 10110 10110 Introduction to Stream Processing @gschmutz
When to Stream / When not? 10s of seconds of more, Constant low Low milliseconds to seconds, Re-run in case of failures Milliseconds & under delay in case of failures Batch Real-Time Near-Real-Time Source: adapted from Cloudera Introduction to Stream Processing @gschmutz
"No free lunch" 10s of seconds of more, Constant low Low milliseconds to seconds, Re-run in case of failures Milliseconds & under delay in case of failures Batch Real-Time Near-Real-Time "Difficult" architectures, lower latency "Easier architectures", higher latency Introduction to Stream Processing @gschmutz
Stream Processing Architecture solves Velocity Enterprise Data Bulk Source Warehouse File DB Extract BI Tools DB SQL Low(est) latency, no history Event Source Search / Explore Mobile Search Hadoop Clusterd Event Hadoop Cluster Event Apps Stream Analytics Event Hub Hub Platform Event Hub Event IoT Stream Stream Data Enterprise Apps Service Location Results Event Stream Analytics { } Stream Social API Logic Reference / Dashboard Telemetry Models Introduction to Stream Processing @gschmutz
Big Data for all historical data analysis Enterprise Data Hadoop Clusterd Bulk Source Hadoop Cluster Warehouse Big Data Platform File File Import / SQL Import Refined DB Results Extract Storage BI Tools Event DB Data Flow Raw SQL Hub Parallel Processing Storage Event Source Search / Explore Mobile Search Hadoop Clusterd Hadoop Cluster Apps Stream Analytics Platform Event Event IoT Stream Stream Data Enterprise Apps Service Location Results Event Stream Analytics { } Stream Social API Logic Reference / Dashboard Telemetry Models Introduction to Stream Processing @gschmutz
Integrate existing systems with lower latency through CDC Enterprise Data Hadoop Clusterd Bulk Source Hadoop Cluster Warehouse Big Data Platform File File Import / SQL Import Refined DB Results Extract Storage BI Tools Event DB Data Flow Change Data Raw SQL Hub Parallel Capture Processing Storage Event Source Search / Explore Mobile Search Hadoop Clusterd Hadoop Cluster Apps Stream Analytics Platform Event Event IoT Stream Stream Data Enterprise Apps Service Location Results Event Stream Analytics { } Stream Social API Logic Reference / Dashboard Telemetry Models Introduction to Stream Processing @gschmutz
New systems participate in event-oriented fashion Enterprise Data Hadoop Clusterd Bulk Source t Warehouse Hadoop Cluster r o p Big Data Platform x E L Q File S File Import / SQL Import Refined DB Results Extract BI Tools Storage SQL Event Data Flow DB Change Data Hub Raw Parallel Capture Processing Search Storage Event Source Search / Explore Mobile Event Stream Analytics Platform Apps Stream Search Event { } IoT Stream Data Stream State Event API Processor Enterprise Apps Stream Location Service Event { } Stream Microservice Platform Service Social API Logic { } Telemetry Event Microservice State API Stream Introduction to Stream Processing @gschmutz
Edge computing allows processing close to data sources Enterprise Data Hadoop Clusterd Bulk Source Hadoop Cluster t Warehouse r o p Big Data Platform x E L File Q File Import / SQL Import S Refined DB Results Extract Storage BI Tools SQL C DB h a n g e D a Raw w C t a a o p Parallel t u l r F e a t Processing a D Storage Search Event Source Event Stream Event Search / Explore Hub Mobile Event Stream Analytics Platform Apps Stream Search Edge Node IoT { } Data Flow Data Stream State API Processor Enterprise Apps Event Hub Location Service { } Microservice Platform Service Social Storage { } API Logic Telemetry Event Microservice State API Rules Stream Introduction to Stream Processing @gschmutz
Unified Architecture for Modern Data Analytics Solutions Enterprise Data Hadoop Clusterd t Bulk Source Hadoop Cluster r Warehouse o Big Data p x E L Q S File File Import / SQL Import Refined DB Results Extract Storage SQL BI Tools Data Flow C h DB a n g e D a Raw C t a a p Parallel t u r e Processing Storage Search Event Source Event Stream Event Stream Analytics Search / Explore Hub Mobile Event Apps Stream Search Edge Node { } IoT Data Flow Data Stream State API Processor Event Hub Location Enterprise Apps Service Microservices { } Social Service Storage { } API Logic Telemetry Event Microservice State Rules API Stream Introduction to Stream Processing @gschmutz
Two Types of Stream Processing (by Gartner) Stream Data Integration Stream Analytics • focuses on the ingestion and processing of • targets analytics use cases data sources targeting real-time extract- • calculating aggregates and detecting transform-load (ETL) and data integration patterns to generate higher-level, more use cases relevant summary information (complex • filter and enrich the data events) • Complex events may signify threats or opportunities that require a response from the business Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte Introduction to Stream Processing @gschmutz
Stream Processing & Analytics Ecosystem Stream Analytics Open Source Closed Source Stream Data Integration Edge Event Hub Source: adapted from Tibco Introduction to Stream Processing @gschmutz
Stream vs. Table / Static Stream Table / Static “History” “State” an unbounded sequence of structured a view of a stream, or another table, and data ("facts") represents a collection of evolving facts Facts in a stream are immutable Latest value for each key in a stream Facts in a table are mutable Introduction to Stream Processing @gschmutz
Important Capabilities for Stream Processing Introduction to Stream Processing @gschmutz
Recommend
More recommend