Simpler, Smarter and Faster Insights Big Data Analytics Processing on streaming, hot and historical data Rajiv Shah Director of Solution Architect and Professional Services 2019
About GigaSpaces 300+ Direct customers We deliver the fastest big data 50+ / 500+ analytics processing platform to Fortune / Organizations run your analytics & machine learning in production, at scale 5,000+ Large installations in production (OEM) 25+ ISVs
GigaSpaces Select Customers OEMs / ISVs / Partners
AnalyticsXtreme: Accelerating Your Data Lake by 100X for Real-time Analytics Your Yo r data is is im immediately se searc rchable le, quer eryable, , and ava vaila lable fo for r analytics • Single logical view for hot, warm and cold data • Hot data resides on in-memory data grid and historical data on HDFS/Object Store • Hot data is muta table le and historical data is immutable le (parquet) Fast Access • Fast access to frequently used historical data STREAMING HOT & WARM DATA DATA Access any data through a unified layer INGEST • Analytics (Spark ML) COLD & ARCHIVED DATA • Query (Spark SQL) Automatic lifecycle management • Automatically handles the underlying data movement, optimization and deletion
GigaSpaces Coverage
GigaSpaces Competitive Edge SPEED Any Data Live, Transactional & Historical Data Deploy Anywhere ANALYTICS SCALE
Data Analytics: Undeniable Value to your Business Dynamic Pricing Predictive Maintenance Helps grow sales by 30% annually Reduces maintenance costs by up to 75% per mile (transportation example) Optimized Operations Saves $100sK in annual savings Personalized Recommendation (banking example) Increases conversions by up to 20X for brick & mortar stores via location-based promotions Risk Analysis Reduces loan losses by 10-30% Fraud Analytics Reduces losses by 3 to 5% in mature environments and by over 30% in evolving contexts Call Center Automation Increases efficiency by over 90%
The Velocity of Business “A typical e -commerce “ To prevent fraud, “A call center receives website will experience anomaly detection 450,0 ,000 ca calls lls/day, each 40% bounce if it loads in needs to happen call needs to be routed in more than 3 s seconds, against 500,000 less than 60 millis illiseconds ” including txn/sec in less than personalization offers” 200 millis illiseconds ” ECOMMERCE TELCO FINANCIAL SERVICES
Use Cases Spanning Industries Benefit from Near Real-time AI Decision Support Systems Built on GigaSpaces • • • Fraud Usage based Personal • insurance Credit risk scoring recommendations • • • Customer 360 Customer 360 Intelligent inventory mgmt. • • • Customer churn Customer churn Customer 360 • • FINANCIAL FI Claims management RETAIL RET IL Locations-based INSUR URANCE SER ERVIC ICES ECO ECOMMERC RCE promotions • • • Predictive maintenance Inventory planning Customer 360 (incl. churn) • • • Intelligent call center routing Customer 360 Fleet management • • Data Center Infrastructure • Predictive maintenance Customer 360 Monitoring (DCIM) • Predictive maintenance INDUS USTRI RIAL MED EDIA IA/ TRANSPORTATIO TRA ION IOT OT TEL TELCO
InsightEdge: Unifying Real-Time Analytics, AI and Transactional Processing in One Platform • Rich ML & DL support • Extreme performance • Fully Transactional Machine Learning • ACID Compliance & Deep Learning • Enterprise-grade (Security, High Availability) In-Memory KEY-VALUE GEO SPATIAL DOCUMENT • Co-located Apps and Services Multi Model Store TABLE COLUMNAR STREAMING • Seamless integration with Big Data Intelligent Multi-tier Storage Management ecosystem • Data sources (Kafka/Nifi/Talend/etc.) STORAGE • Data lakes (S3/Hadoop/etc.) ORCHESTRATION • BI tools (Tableau/Looker/etc.) CLOUD/HYBRID/ ON-PREMISE
Traditional vs. Unified “Translytical” Processing TRANSACTIONAL/ANALYTICAL TRANSACTIONAL/ANALYTICAL TRADITIONAL UNIFIED PROCESSING PROCESSING TRANSACTIONAL PROCESSING TRANSACTIONAL PROCESSING SLOW FAST IN-MEMORY FEEDBACK IMPACTS DATA REPLICATION FEEDBACK DATA GRID LOOP Real-time analytics LOOP Greater situation awareness Simplified architecture ANALYTICS ANALYTICS
UNIFYING Analytics and Transactional Processing at SCALE & SPEED BI TOOLS DATA LAKE DATABASE & DATA WAREHOUSE APPLICATIONS MOBILE WEB IOT ANALYTICS, MACHINE & DEEP LEARNING APPS & MICROSERVICES BI & VISUALIZATION SECURITY AND AUDITING MANAGEMENT AND MONITORING REST ORCHESTRATION EVENT MICROSERVICES DEEP MICROSERVICES EVENT MACHINE SPARK JOBS SQL/JDBC NOTEBOOK STREAMING (REST) PROCESSING PROCESSING LEARNING LEARNING (REST) RPC & MAP/REDUCE CDC Engine CORE CR8 RPC & MAP/REDUCE MemoryXtend - MULTI-TIERED STORAGE MULTI MODEL STORE DATA OBJECTS, JSON, KEY-VALUE, TABLES, TEXT, SSD IN-MEMORY RAM LAKE GEO SPATIAL, GRAPH DATA GRID EVENT PERSISTENT WAN GW - MULTI SITE REPLICATION PROCESSING MEMORY WAN GATEWAY CLUSTER MANAGEMENT & SERVICE DISCOVERY ON-PREMISE HYBRID CLOUD
Ultra-low latency and high throughput transactional processing IMDG Partitioned In-Memory Grid Shared-nothing, linear scalability, MOBILE WEB IOT elastic capacity Co-Location of Data and Business Logic Co-located ops, event-driven, ANALYTICS & BIG DATA APPS & MICROSERVICES SEARCH, BI & QUERY fast indexing SECURITY AND AUDITING MANAGENENT AND MONITORING MANAGENENT AND MONITORING EVENT MACHINE MICROSERVICES MICROSERVICES EVENT SPARKL SQL .NET SQL/JDBC SEARCH STREAMING JAVA LEARNING (REST) PROCESSING PROCESSING Event-Driven Processing and (REST) Map/Reduce No Downtime Auto-healing, multi-data center RPC & RPC & DATA MODELS WEB CONTAINERS EVENT replication, fault tolerance MAP/REDUCE MAP/REDUCE (SPATIAL, POJO, JSON) PROCESSING IN-MEMORY Fast Indexing Multi-Data Model DATA GRID POJO, .NET, Document/JSON, RAM SSD SPERSISTENT DATA REPLICATION Geospatial, Time-series STORAGE MEMORY & PERSISTENCE Seamless Integration with CLUSTER MANAGEMENT & SERVICE DISCOVERY Java/Scala ecosystem Cloud, Kubernetes, Docker Native ON-PREMISE HYBRID CLOUD
Co-located Analytics and AI with Transactional Processing MOBILE WEB IOT ANALYTICS & BIG DATA SEARCH, BI & QUERY APPS & MICROSERVICES SEARCH, BI & QUERY SECURITY AND AUDITING MANAGENENT AND MONITORING MANAGENENT AND MONITORING MACHINE MACHINE EVENT MICROSERVICES MICROSERVICES EVENT SPARK SQL STREAMING .NET SQL/JDBC SQL/JDBC SEARCH SEARCH JAVA LEARNING LEARNING (REST) PROCESSING PROCESSING (REST) Distributed SQL-99 Spark for ML and leading DL frameworks RPC & RPC & DATA MODELS WEB CONTAINERS EVENT MAP/REDUCE Real-time Push-down predicate for ultra-low MAP/REDUCE (SPATIAL, POJO, JSON) PROCESSING latency filter (30x faster) integration with IN-MEMORY Tableau and DATA GRID Business Shared RDDs/DataFrames RAM SSD STORAGE-CLASS DATA REPLICATION STORAGE MEMORY & PERSISTENCE Intelligence tools Streaming with 99.999% availability JDBC driver CLUSTER MANAGEMENT & SERVICE DISCOVERY Deep Learning with Intel BigDL Graph processing, text mining, geospatial ON-PREMISE HYBRID CLOUD
Benchmark (in IOPS) • Persistent Memory +249% than SSD • Persistent Memory +159% than SSD • RAM (off-heap) +180% than SSD • RAM (off-heap) +350% than SSD
Costs Analysis for 5GB usable data • CAPEX reduction of up to 50% with RAM off-heap vs. on-heap • CAPEX reduction of up to 75% with AEP vs. RAM on-heap • OPEX reduction by X10
Tiered Storage Architecture Higher Performance – Optimized TCO 10X less expensive than only RAM maintaining Define which in-memory performance data resides on which layer per class and per field `
Kubernetes and Docker
LAMBDA ARCHITECTURE IS COMPLICATED BATCH LAYER DATA SOURCES DATA CAPTURE/ LAYER BATCH ANALYTICS STORAGE APPLICATIONS Files FILES Public Cloud (GCP) EMR Public Cloud MESSAGE BUS (AWS) Capture Public Cloud (Azure) SPEED LAYER DATABASES MANAGEMENT LAYER STORAGE & CACHE EVENT-DRIVEN ANALYTICS Private Cloud Serverless, e.g. Events AWS Lambda EVENTS Kafka consumers CDC, Kinesis Enabled App Event Hubs Google Pub/Sub Message Azure Bus Cosmos DB SENSOR DATA SOCIAL CONTROL LAYER (Management, Orchestration, and Security)
LAMBDA ARCHITECTURE MADE SIMPLE BATCH LAYER DATA SOURCES DATA CAPTURE/ LAYER BATCH ANALYTICS STORAGE APPLICATIONS FILES Public Cloud (GCP) EMR Public Cloud MESSAGE BUS (AWS) Capture Files Smart access to historical context Public Cloud (Azure) SPEED LAYER • No ETL, reduced complexity DATABASES MANAGEMENT LAYER Built-in integration with external • STORAGE & CACHE EVENT-DRIVEN ANALYTICS Private Hadoop/Data Lakes S3-like Cloud Serverless, e.g. Events AWS Lambda • Fast access to historical data EVENTS Kafka consumers Automated life-cycle management • CDC, Kinesis Enabled App Event Hubs Google Pub/Sub Message Azure Bus Cosmos DB SENSOR DATA SOCIAL CONTROL LAYER (Management, Orchestration, and Security)
Leverage leading BI Platforms Tableau Looker Qlik Power BI
Recommend
More recommend