How-to for real-time streaming and analytics at scale with Apache Kafka and Apache Ignite Viktor Gamov, Confluent, @gamussa Denis Magda, GridGain, @denismagda
Digital Transformations Challenges Application Layer • 10-100x more queries and transactions Web-Scale Apps IoT Mobile Apps Social Media • 50x as much data today as a decade ago 10-100x 10-1000x 50x Queries and Faster Data Storage Transactions Analytics (Big Data) (per Sec) (Hours to Sec) • Overnight analytics becomes real-time Data Layer RDBMS NoSQL Hadoop @gamussa @denismagda
In-Memory Computing and Real-Time Streaming To Solve the Challenges § Performance Increases 10x to 1,000x Application Layer § Act faster by analyzing streams of data Web-Scale Apps IoT Mobile Apps Social Media GridGain Confluent § Scalability up to petabytes of data In-Memory Computing Platform Streaming Platform Transactional Persistence @gamussa @denismagda
Pre-Streaming Era @gamussa @denismagda
Streaming-First Workd @gamussa @denismagda
Origins in Streams Processing Java Apps with Kafka Streams or KSQL Serving Layer Apache Ignite, GridGain, etc. High Throughput API based Continuous Streaming platform clustering Computation @gamussa @denismagda
Search Stream Processing Real Time Analytics DW RDBMS KV Apps @gamussa @denismagda Monitoring
CONSUMER PRODUCER Consumer Producer Application Application • Where to restart ? • How to handle failure & retries ? • How to scale and parallelize ? • How to properly use the producer • What metrics to capture ? / consumer API ? @gamussa @denismagda
KAFKA CONNECT KAFKA CONNECT CONSUMER PRODUCER Source Sink Connector SMTs SMTs Converter Converter Connector • Offset management • Task distribution • Configuration • Elastic scalability • Metrics management • Parallelization • Failure & retries • REST API • Schemas & data types @gamussa @denismagda
Discover connectors, SMTs, and converters @gamussa @denismagda
Discover connectors, SMTs, and converters Descriptions, licensing, support, and more @gamussa @denismagda
Lower the Bar to Enter the World Core developers who use Java/Scala streams Coding Sophistication Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa @denismagda
GridGain and Kafka Connect 💶 @gamussa @denismagda
GridGain: Real-time Streaming and Analytics @gamussa @denismagda
Essential GridGain APIs Distributed memory-centric storage Co-located Computations Distributed Key-Value Combines the performance and scale of in- Brings the computations to the servers where Read, write and transact with memory computing together with the disk the data actually resides, eliminating need to fast key-value APIs durability and strong consistency in one system move data over the network Distributed SQL ACID Transactions Machine and Deep Learning Supports distributed ACID transactions for Set of simple, scalable and efficient tools that Horizontally, fault-tolerant distributed SQL key-value as well as SQL operations allow building predictive machine learning database that treats memory and disk as models without costly data transfers (ETL) active storage tiers @gamussa @denismagda
GridGain SQL For Real-Time Analytics Ignite Node Toronto 2 Montreal Canada Ottawa Calgary 1 Ignite Node 3 2 Mumbai India New Delhi 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one @gamussa @denismagda
Demo
Q&A
Recommend
More recommend