Make your data science actionable, real-time machine learning inference with stream processing. Neil Stevenson, Solution Architect Hazelcast 3rd June 2019
13:45 – 14:35 neil@hazelcast.com
Which came first ? (Chicken | Egg) neil@hazelcast.com
Chicken neil@hazelcast.com
What relevance is this?! What is this ? • You can eat them • They lay eggs • They can be pets • Not just any old chicken but… • MY CHICKEN • A Bresse Gauloise 5
Stream Processing neil@hazelcast.com
Business Challenges for Real-time Applications Latency & Speed Time is money Scalability Hazelcast scales effortlessly responding to peaks, valleys for optimal utilization Real-Time, Continuous Intelligence Real-time view of constantly changing operational data Zero Downtime Built for high resiliency 7
In-Memory Platform Data In Motion Situational Geospatial Weather Analytics Live Streams Kafka, JMS, Feeds Databases Predictions JDBC, Relational, NoSQL, Change Events Jet Decisions Files Cluster HDFS, Flat Files, Logs, File watcher Data at Rest Applications Sockets Alerts IMDG Internet of Things Sensors, Smart Things IMDG IMDG IMDG Cluster 8
Hazelcast IMDG In-Memory Data Grid Integrate Communicate APIs, Microservices, Serialization, Protocols Notifications Analytics Visualization Mobile Store/Update Compute Data Lake Caching, CRUD Persistence Query, Process, Execute Scale Replicate Live Streams Clustering & Cloud, High Apps Kafka, JMS, WAN Replication, Partitioning Density Sensors, Feeds Secure Available Databases Privacy, Authentication, Rolling Upgrades, Hot Restart JDBC, Authorization Social Relational, NoSQL, Jet In-Memory Streams Change Events Ingest & Transform Combine Files Commerce HDFS, Flat Files, Events, Connectors, Filtering Join, Enrich, Group, Aggregate Logs, File watcher Stream Compute & Act Windowing, Event-Time Distributed & Parallel Processing Computations Applications Available Secure Communities Sockets Privacy, Authentication, Job Elasticity, Graceful Authorization shutdown Management Center Secure | Manage | Operate Embeddable | Scalable | Low-Latency Secure | Resilient | Distributed 9
Hazelcast Jet - options Client-Server Application Java API IMDG IMDG Jet Application Application Java API Java API Application Application Application Application Java Client Java Client Java Client Java Client • No separate process to manage • Separate Jet Cluster • Great for microservices • Scale Jet independent of applications • Great for OEM • Isolate Jet from application server lifecycle • Simplest for Ops – nothing extra • Managed by Ops 10
Hazelcast Jet & IMDG Message Broker (Kafka) HDFS Data Enrichment Jet Compute Cluster Source / Sink Enrichment Enrichment Sink Jet Cluster Hazelcast IMDG Cluster Good when: Good when: • Where source and sink are primarily Hazelcast • Where source and sink are primarily Hazelcast • Jet and Hazelcast have equivalent sizing needs • Where you want isolation of the Jet cluster 11
Streaming Use Cases Real-time Stream Data-Processing processing ETL/Ingest Microservices Edge Processing • Supports common • Big Data in near real- • Data-processing • Low-latency analytics sources such as HDFS, time microservices and decision making File, Directory, Sockets • Distributed, in- • Isolation of services • Saves bandwidth and • Custom sources can be memory computation with many, small keeps data private by easily created clusters processing it locally • Aggregating, joining • Batch and streaming multiple sources, • Service registry • Lightweight – runs on filtering, transforming, restricted hardware • Streaming ingest from • Network discovery enriching Oracle, SQL Server, • Both processing and • Inter-process MySQL using Striim • Elastic scalability storage messaging • Sink to Hazelcast or • Super fast • Fully embeddable for • Fully embeddable other operational data simple packaging • High availability stores • Spring Cloud, Boot • Zero dependencies for • Fault tolerant Data Services simple deployment 12
Hazelcast Jet? High performance | Industry Leading Performance Stream Processing & Data Grid | Source, Sink, Enrichment Very simple to program | Leverages existing standards Very simple to deploy | Embed 14MB jar or Client-Server Works in every Cloud | Same as Hazelcast IMDG 13
The Evolution of Stream Processing neil@hazelcast.com
Generations Distributed Batch Compute – MapReduce – scaled, parallelized, distributed, resilient, - not real-time or 1 st Gen (2000s) Siloed, Real-time – Complex Event Processing – specialized languages, not resilient, not distributed(single Hadoop(batch) or Apama(CEP) instance), hard to scale, fast, but brittle, proprietary hard choices Micro-batch distributed – heavy weight, complex to manage, not elastic, require large dedicated environments with many moving parts, 2 nd Gen (2014) not Cloud-friendly, not low-latency Spark hard to manage Distributed, real-time streaming – highly parallel, true streams, advanced techniques (Directed Acyclic Graph) enabling reliable distributed job execution Flexible deployment - Cloud-native, elastic, embeddable, light-weight, supports serverless, fog & edge. 3 rd Gen (2017 Jet & Flink) Low-latency Streaming, ETL, and fast-batch processing, built on proven data grid f lexible & scalable True “Fast Data” 15
Streams … hiding in plain sight Unix: ls | tr ‘A-Z’ ‘a-z’ | grep txt | wc Pipe == directed acyclic graph! As in pipeline, mainly linear, no routing or collation ls – source tr – intermediate “infinite” stage grep – intermediate “infinite” stage wc - sink 16
Performance 17
AI neil@hazelcast.com
Computers… they’re out there… 19
AI Techniques Continue to Expand & Evolve Image/Video Processing Each Innovation Unstructured Data Introduces New Classification Fraud & Anomaly Detection Supervised Challenges in Learning Predicting Trends Scalability of Structured Numeric Data Regression Compute & Storage Image Processing Dimensionality Unstructured Data Reduction Unsupervised Feature Extraction Machine Learning Learning Data Exploration Clustering Feature Extraction AI Reinforcement Simulation Images, Video, Audio Learning Advanced Machine Learning & AI Time-Series Analysis Deep Learning 20
Machine Learning neil@hazelcast.com
Information Flow for Machine Learning Real-time ML Validation & Enrichment Ingest Transform Predict Verification Demands In-Memory Messaging Serving Online – Continuous Stream Processing Inference Production Models Ingest Offline – ETL Processing Training & Testing Data Wrangling & (ML Tools) Exploration 22
Online Machine Learning within an In- Memory Platform Low-Latency Stream Processing - Data in Motion Enrich Classify Predict Pro-Act Ingest Hazelcast Jet Low-Latency Data Grid Data at Rest Models Context Meaning Hazelcast IMDG Offline – Slow Data Batch ML Data at Rest No SQL Model Training Data Lake 23
Advantages of In-Memory Platform for ML § Fast § Data Held in Memory for Low Latency Processing § Models also held in-memory § Compute with Data Locality Further Reduces Latency § Elastic § Job Elasticity – Leveraging Directed Acyclic Graph & Cooperative Work Sharing § Compute & Data Layers Easy to Scale – Not Bound to Disks § Supports Microservices and Serverless Architectures § Resilient § Multi-Data Center Architectures Enable 99.999% Uptime at Scale § Lossless Job Recovery and Exactly-One Processing Achieved with In-Memory Replicated State 24
Feature Engineering Low-Latency Stream Processing - Data in Motion Enrich Classify Ingest Hazelcast Jet Low-Latency Data Grid Data at Rest Models Context Meaning Hazelcast IMDG Offline – Slow Data Batch ML No SQL Data Exploration Model Training & Data Science Data Lake 25
Speed Matters neil@hazelcast.com
Eg. Credit Card fraud analysis Payment Business Challenge Evolution iPhones # of # of � Performance at massive scale Square Card Transactions � Increase in fraud attempts Terminals eCommerce Traditional Time Time Tiny Window of Time For Accurate Processing Time- Swipe Card-Processing Based Majority of Time Consumed in Network Transit Infrastructure SLA Response Milliseconds Initial Processing: Microseconds Fraud Performance At Scale Detection gives time for Algorithm Multiple Algorithms 27
Eg. Credit Card fraud analysis What If? Customer History Personalized Payment Payment Instructions Values Locations Account Customer Actions Balance Payment History Payment “What Ifs?” What are their balances? - Risk > Payment > Identify fraud > Block payment • What is their history? - Opportunity > Real-time Offers > Upsell • 28
Eg. Real time offers in e-commerce Consumer Shopping Product Product Adding to PAUSE to Check Out Flow Search Views Cart Compare “Directed Acyclic Graph” Cart at Risk Dynamic Offer clickstream eCommerce App Servers - Insights - Decisions Jet Cluster - Predictions IMDG - Alerts IMDG IMDG Write Through to DB 29
Demo time ! neil@hazelcast.com
Recommend
More recommend