make your data science actionable real time machine
play

Make your data science actionable, real-time machine learning - PowerPoint PPT Presentation

Make your data science actionable, real-time machine learning inference with stream processing. Neil Stevenson, Solution Architect Hazelcast 3rd June 2019 13:45 14:35 neil@hazelcast.com Which came first ? (Chicken | Egg)


  1. Make your data science actionable, real-time machine learning inference with stream processing. Neil Stevenson, Solution Architect Hazelcast 3rd June 2019

  2. 13:45 – 14:35 neil@hazelcast.com

  3. Which came first ? (Chicken | Egg) neil@hazelcast.com

  4. Chicken neil@hazelcast.com

  5. What relevance is this?! What is this ? • You can eat them • They lay eggs • They can be pets • Not just any old chicken but… • MY CHICKEN • A Bresse Gauloise 5

  6. Stream Processing neil@hazelcast.com

  7. Business Challenges for Real-time Applications Latency & Speed Time is money Scalability Hazelcast scales effortlessly responding to peaks, valleys for optimal utilization Real-Time, Continuous Intelligence Real-time view of constantly changing operational data Zero Downtime Built for high resiliency 7

  8. In-Memory Platform Data In Motion Situational Geospatial Weather Analytics Live Streams Kafka, JMS, Feeds Databases Predictions JDBC, Relational, NoSQL, Change Events Jet Decisions Files Cluster HDFS, Flat Files, Logs, File watcher Data at Rest Applications Sockets Alerts IMDG Internet of Things Sensors, Smart Things IMDG IMDG IMDG Cluster 8

  9. Hazelcast IMDG In-Memory Data Grid Integrate Communicate APIs, Microservices, Serialization, Protocols Notifications Analytics Visualization Mobile Store/Update Compute Data Lake Caching, CRUD Persistence Query, Process, Execute Scale Replicate Live Streams Clustering & Cloud, High Apps Kafka, JMS, WAN Replication, Partitioning Density Sensors, Feeds Secure Available Databases Privacy, Authentication, Rolling Upgrades, Hot Restart JDBC, Authorization Social Relational, NoSQL, Jet In-Memory Streams Change Events Ingest & Transform Combine Files Commerce HDFS, Flat Files, Events, Connectors, Filtering Join, Enrich, Group, Aggregate Logs, File watcher Stream Compute & Act Windowing, Event-Time Distributed & Parallel Processing Computations Applications Available Secure Communities Sockets Privacy, Authentication, Job Elasticity, Graceful Authorization shutdown Management Center Secure | Manage | Operate Embeddable | Scalable | Low-Latency Secure | Resilient | Distributed 9

  10. Hazelcast Jet - options Client-Server Application Java API IMDG IMDG Jet Application Application Java API Java API Application Application Application Application Java Client Java Client Java Client Java Client • No separate process to manage • Separate Jet Cluster • Great for microservices • Scale Jet independent of applications • Great for OEM • Isolate Jet from application server lifecycle • Simplest for Ops – nothing extra • Managed by Ops 10

  11. Hazelcast Jet & IMDG Message Broker (Kafka) HDFS Data Enrichment Jet Compute Cluster Source / Sink Enrichment Enrichment Sink Jet Cluster Hazelcast IMDG Cluster Good when: Good when: • Where source and sink are primarily Hazelcast • Where source and sink are primarily Hazelcast • Jet and Hazelcast have equivalent sizing needs • Where you want isolation of the Jet cluster 11

  12. Streaming Use Cases Real-time Stream Data-Processing processing ETL/Ingest Microservices Edge Processing • Supports common • Big Data in near real- • Data-processing • Low-latency analytics sources such as HDFS, time microservices and decision making File, Directory, Sockets • Distributed, in- • Isolation of services • Saves bandwidth and • Custom sources can be memory computation with many, small keeps data private by easily created clusters processing it locally • Aggregating, joining • Batch and streaming multiple sources, • Service registry • Lightweight – runs on filtering, transforming, restricted hardware • Streaming ingest from • Network discovery enriching Oracle, SQL Server, • Both processing and • Inter-process MySQL using Striim • Elastic scalability storage messaging • Sink to Hazelcast or • Super fast • Fully embeddable for • Fully embeddable other operational data simple packaging • High availability stores • Spring Cloud, Boot • Zero dependencies for • Fault tolerant Data Services simple deployment 12

  13. Hazelcast Jet? High performance | Industry Leading Performance Stream Processing & Data Grid | Source, Sink, Enrichment Very simple to program | Leverages existing standards Very simple to deploy | Embed 14MB jar or Client-Server Works in every Cloud | Same as Hazelcast IMDG 13

  14. The Evolution of Stream Processing neil@hazelcast.com

  15. Generations Distributed Batch Compute – MapReduce – scaled, parallelized, distributed, resilient, - not real-time or 1 st Gen (2000s) Siloed, Real-time – Complex Event Processing – specialized languages, not resilient, not distributed(single Hadoop(batch) or Apama(CEP) instance), hard to scale, fast, but brittle, proprietary hard choices Micro-batch distributed – heavy weight, complex to manage, not elastic, require large dedicated environments with many moving parts, 2 nd Gen (2014) not Cloud-friendly, not low-latency Spark hard to manage Distributed, real-time streaming – highly parallel, true streams, advanced techniques (Directed Acyclic Graph) enabling reliable distributed job execution Flexible deployment - Cloud-native, elastic, embeddable, light-weight, supports serverless, fog & edge. 3 rd Gen (2017 Jet & Flink) Low-latency Streaming, ETL, and fast-batch processing, built on proven data grid f lexible & scalable True “Fast Data” 15

  16. Streams … hiding in plain sight Unix: ls | tr ‘A-Z’ ‘a-z’ | grep txt | wc Pipe == directed acyclic graph! As in pipeline, mainly linear, no routing or collation ls – source tr – intermediate “infinite” stage grep – intermediate “infinite” stage wc - sink 16

  17. Performance 17

  18. AI neil@hazelcast.com

  19. Computers… they’re out there… 19

  20. AI Techniques Continue to Expand & Evolve Image/Video Processing Each Innovation Unstructured Data Introduces New Classification Fraud & Anomaly Detection Supervised Challenges in Learning Predicting Trends Scalability of Structured Numeric Data Regression Compute & Storage Image Processing Dimensionality Unstructured Data Reduction Unsupervised Feature Extraction Machine Learning Learning Data Exploration Clustering Feature Extraction AI Reinforcement Simulation Images, Video, Audio Learning Advanced Machine Learning & AI Time-Series Analysis Deep Learning 20

  21. Machine Learning neil@hazelcast.com

  22. Information Flow for Machine Learning Real-time ML Validation & Enrichment Ingest Transform Predict Verification Demands In-Memory Messaging Serving Online – Continuous Stream Processing Inference Production Models Ingest Offline – ETL Processing Training & Testing Data Wrangling & (ML Tools) Exploration 22

  23. Online Machine Learning within an In- Memory Platform Low-Latency Stream Processing - Data in Motion Enrich Classify Predict Pro-Act Ingest Hazelcast Jet Low-Latency Data Grid Data at Rest Models Context Meaning Hazelcast IMDG Offline – Slow Data Batch ML Data at Rest No SQL Model Training Data Lake 23

  24. Advantages of In-Memory Platform for ML § Fast § Data Held in Memory for Low Latency Processing § Models also held in-memory § Compute with Data Locality Further Reduces Latency § Elastic § Job Elasticity – Leveraging Directed Acyclic Graph & Cooperative Work Sharing § Compute & Data Layers Easy to Scale – Not Bound to Disks § Supports Microservices and Serverless Architectures § Resilient § Multi-Data Center Architectures Enable 99.999% Uptime at Scale § Lossless Job Recovery and Exactly-One Processing Achieved with In-Memory Replicated State 24

  25. Feature Engineering Low-Latency Stream Processing - Data in Motion Enrich Classify Ingest Hazelcast Jet Low-Latency Data Grid Data at Rest Models Context Meaning Hazelcast IMDG Offline – Slow Data Batch ML No SQL Data Exploration Model Training & Data Science Data Lake 25

  26. Speed Matters neil@hazelcast.com

  27. Eg. Credit Card fraud analysis Payment Business Challenge Evolution iPhones # of # of � Performance at massive scale Square Card Transactions � Increase in fraud attempts Terminals eCommerce Traditional Time Time Tiny Window of Time For Accurate Processing Time- Swipe Card-Processing Based Majority of Time Consumed in Network Transit Infrastructure SLA Response Milliseconds Initial Processing: Microseconds Fraud Performance At Scale Detection gives time for Algorithm Multiple Algorithms 27

  28. Eg. Credit Card fraud analysis What If? Customer History Personalized Payment Payment Instructions Values Locations Account Customer Actions Balance Payment History Payment “What Ifs?” What are their balances? - Risk > Payment > Identify fraud > Block payment • What is their history? - Opportunity > Real-time Offers > Upsell • 28

  29. Eg. Real time offers in e-commerce Consumer Shopping Product Product Adding to PAUSE to Check Out Flow Search Views Cart Compare “Directed Acyclic Graph” Cart at Risk Dynamic Offer clickstream eCommerce App Servers - Insights - Decisions Jet Cluster - Predictions IMDG - Alerts IMDG IMDG Write Through to DB 29

  30. Demo time ! neil@hazelcast.com

Recommend


More recommend