In-mem DB Performance, Flash Cost Enabling Real-time AI June 2018
The Data-Driven Business Challenge From Reactive to Proactive AI Event-Driven Value of Data Interactive Batch Real-time Minutes Days Time to Action 2
Big and S low or S mall and Fast Too slow Batch Layer Big data but slow Not up to date Complex View 1 View 2 Reports ETL Tools Change Log Batch Processing Data Lake OR Data Limited context Real-time Layer Sources In- S mall amounts of data Memory Expensive Real-time Lacks context NoS QL Dashboard S tream S erving S tream Processing 3
Traditional Approach, DB over File over Flash Traditional Layered Approach Rigid APIs 10 GbE fabric Database File System VM Hypervisor • S low HCI / Storage Stack • Complex • Expensive 10-100 GbE fabric For every file IOs conducted by the DB (Record, Redo/Undo, Metadata, ..) Ext ernal (NVMeOF / Obj ect ) 4
New Cloud Databases Are Built to S cale Ops & Capacity API & Transaction Distributed Processing & Cache Capacity (Object) Decouple access, processing, and capacity and eliminate storage serialization 5
Breaking The Volume and Velocity Barrier Apps, APIs, and Functions S upport many standard APIs on a common DB Engine 100 GbE fabric Real-time Firewall Unique architecture which Real-Time DB use NVMe Flash as an extension of OS Memory 100TB NVMe Flash (direct attached) Re-engineer the stack to deliver memory speed with Flash density 6
Breaking Performance Barriers – Design Principles Never blocking, never locking, 100% parallelism Lat ency opt imized, QoSaware, dat a scheduler Lockless, preempt less memory management True scale out t hrough parallelism Zero processing wastes CPU cache opt imizat ion and predict ion E2E zero buf f er dat a f low (NIC t o Disk, accelio) Complet e OS bypass HW awareness RDMA, NVMe (3DXP) Vect or processing operat ions IRQ balancing and t hrot t ling 7
Ok, any other challenges on the way to real-time AI ?
90% of AI Today Build feature vectors Inspect, using batch and CS Vs Improve How do we form complex feature vectors in real-time? How do we visualize or act on the results in real-time?
Moving to Continuous Ingest + AI + S erve Flow External Data lakes 10
From S ilos and ETLs to All-in-one DBs Traditional: Unique Model Per Store Multi-Model Store Index Met adat a & dat a S ize, t ime, Column families t ype, owner Dir Name S imple Data Extents File (tree) (tree) Metadata Key (hash) Base attr Value attr Value Key Extended Data Blob Obj ect Metadata code (Flex) code (Flex) Path Name (Random hash) Metadata (immutable) Key S imple Value Blob K/ V (Random hash) Metadata (immutable) Mult iple Indexes Any Dat a Type Random, • Nest ed at t ribut es (encoded) Key Value Value Value sequent ial and • Flexible value t ypes Table (fixed) (S eq tree) (typed) (typed) (typed) hierarchical • Can be organized and viewed as ext ent s, rows, cols or logs Key Value Value Document attr attr (S eq tree) (Flex) (Flex) Independent t iering logic for indexes, met adat a and dat a S hard Value Value S t ream Topic ts ts / Metric Blob Blob 11
Time S eries Data Example Raw time series sample data Optimized TSDB Layout (per unique metric) Thousands of samples Ingest/ compress Filter based on labels In real-time Labels Pre-aggregation arrays: (to accelerate queries) Dat a T/ V chunks with 10:1 Gorilla compression Real-time 50 : 1 10–100x Consistency Compression Faster Queries 12
S erverless, The New S tored Procedure Traditional Dev and Ops Model “Serverless” Development Model Write code + local testing Write code + local testing Build code and Docker image Provide spec, push deploy CI/ CD pipeline Add logging and monitoring 1. Automated by the Harden security serverless platform Provision servers + OS 80% Handle data/ event feed 2. Pay for what you use Handle failures/ auto-scaling Handle rolling upgrades Configuration management 13
Addressing S erverless Limitations With Nuclio Performance Streaming and Batch Statefulness nuclio processor Shard 1 Workers Shard 2 Funct ion Event Workers Shard 3 Workers List eners Functions DB, MQ, File Shard 4 Workers Data bindings Non-blocking, parallel Auto-rebalance, checkpoints S hared volumes Any source: Kafka, NATS Zero copy, buffer reuse , Kinesis, event- hub, iguazio, pub/ sub, RabbitMQ, Cron Context cache Up to 400K events/ sec/ proc Serverless for compute and data intensive tasks 100x faster than AWS Lambda ! 14
Delivering Intelligent Decisions in Real-Time External ML Models Context 500TB of Raw Data Real-time triggers Ingest Enrich AI Act Serve Ingested in real-time Unified Real-Time DB (compressed to 10TB) Real-time and historical dashboards 15
Cyber and Network Ops A leading telco needs to predict network behavior in real-time: Processing high message throughput from multiple streams at the rate of > 50K events/ sec Cross correlating with historical and external data in real-time AI predictions/ inferencing conducted on live data S mall footprint to fit network locations 16
Build and Operationalize Proactive S ystems Faster Traditional Continuous Analytics Real-time Real-time and Batch Data Source Streaming Data Stores Visualization Visualization S pect rum Data Source Streaming Data Stores + Actions S t reaming Net cool S t reaming REST S pect rum API S t reaming Batch Net cool Data Source ETL Data Stores Visualization S t reaming S MOD S MOD • Complex, skill gaps, slow to productize • Simple, just a few weeks to a working app • No single view of ops, real-time, history • Unified view across ALL data • Reactive (no actions) • AI driven, proactive
Predictive Maintenance Based on Real-time + Historical + Ops Data Real-time Predicted Devices & Machines Alerts Alerts Process AI Predict Upload to S t ream Every 15 Every 6 Update ML Web Sensor Data Trigger Azure ML minut es hours Cloud hook Model Aggregate using Time Series APIs Query APIs ML Models • NoSQL & Machine Metadata • Real-time Time Series Time Series Vectors Environmental data • API (Avg, Min/Max, Stdev per sensor) dashboard intelligent edge
Demo: Voice Driven Real-Time Analytics GOOGLE MAP SERVICE Voice Update Query SMART HOME Locations DEVICE SQL Query AI SQL API WEB UI (REACT)
Demo Video 20
S ummary Build continuous, data-driven and proactive apps Deliver real-time analytics on fresh, historical and operational data Optimize Flash usage to deliver in-memory speed at much lower costs Create a unified data layer for stream processing, AI and serving Adopt cloud-native and serverless approaches to gain agility 21
Thank Y ou i nf o@i guazi o. com | www. i guazi o. com
Recommend
More recommend