Real-time Analytics Powered by GPU-Accelerated Databases Chris Prendergast and Woody Christy GTC, May 8, 2017
Kinetica Background United States Army Intelligence seeks a means GPUdb goes live Commercialization to assess terrorist and other with the US Army entered production national security threats. Intelligence. with USPS. No database in the market was fast or flexible enough to met their needs. Rebranded to Kinetica. Seed funding. Moved HQ to San Francisco. Expanded management 2016 2012 2014 team. Hired field team. 2009 Founders Amit Vij Patent granted for “Method and Nima Negahban and system for improving start on the Wins IDC HPC Innovation computational concurrency Wins IDC HPC Innovation pioneering use of Excellence Award for work using a multi-threaded GPU Excellence Award for work GPUs while building with US Postal Service. calculation engine” with US Army. a GPU-accelerated database from the ground up. 2 2
Evolution of Analytics GPU Acceleration Standard Analytics Machine Learning Deep Learning Simple Real-time Analytics Reporting List customer energy What is the average What is the current Given location, history, Deduce from consumption in the past consumption by region energy consumption by demographic, , usage, unspecified signals 3 years monthly? Per a region / household? what is the likelihood of across a wide range of household? Residential How does that compare service issues/outage? datasets the likelihood vs. Commercial? to historic averages? this customer will How does it compare to consume more/less other regions? energy? Have service interruption? 3
GPU Acceleration Overcomes Processing Bottlenecks GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of large data sets. 4,000+ cores per device in many cases, versus 16 to 32 cores per Parallel processing is ideal for typical CPU-based device. scanning entire dataset & brute force compute. High performance computing trend to using GPU’s to solve massive processing challenges GPU acceleration brings high performance compute to commodity hardware 4
Kinetica: A Distributed, In-Memory Database GPU-accelerated Natural language Native GIS and IP- Real time data database operations processing based address object handlers to ingest full-text search support structured and unstructured data Deep integration with open No typical tuning, Predictable scale out Distributed visualization source and commercial indexing, and for data ingestion and frameworks and applications: pipeline built in tweaking querying Hadoop, Spark, NiFi, Accumulo, H20, Tableau, Kibana and Caravel 5
Kinetica: Unique Strengths & Capabilities Fast, Distributed, OLAP Engine for Fast OLAP Moving, Large Scale Data Performance, Scalability, Fast Data Stability Converged AI and BI API for GPU Interactive Native Geospatial and Geospatial In-Database Powered Data Location-Based Processing & Visualization Pipeline & Compute Analytics Visualization Analytics Orchestration 6
Challenges with Lambda and Kappa Architectures What is the main problem? Database or Cache system serving up pre-computed aggregates It also takes a lot of effort to re-compute aggregates and to load the serving database or cache 7
CASE STUDY Performance BI LARGE TELCO Query 1 : Simple average calculation on the 1.8B row table 0.09s 0.65s 345s Leading Enterprise Database Query 2 : Sum aggregation with a subquery aggregation joining both tables 2.5s 0.68s Leading Enterprise Database 44s 8
Real-Time, Advanced Analytics, Speed Layer for Teradata or Oracle Parallel ingestion of events Lambda-type architecture for ANALYSTS Teradata or Oracle Amazon Kinesis Fast GPU Kinetica is speed layer with accelerated, in- Memory Database real-time analytic capabilities MOBILE ALERTING Converge ML, DL, for millisecond SLAs USERS SYSTEMS Streaming, Location, and DATA IN MOTION Converge Machine Learning, QR&A Kinetica AND REST Connectors Deep Learning, NLP, DASHBOARDS & APPLICATIONS streaming and location STREAM / ETL PROCESSING analytics and fast Query, Reporting & Analytics with Kinetica & Teradata/Oracle DATA WAREHOUSE / TRANSACTIONAL 9
Speed Layer for Hadoop Parallel ingestion of events Kinetica is speed layer with real- ANALYSTS Put, get, scan time analytic capabilities Amazon Kinesis HDFS for archival store MOBILE ALERTING Much looser coupling than USERS SYSTEMS Kinetica traditional lambda architecture EVENTS Connectors Execute complex analytics on the fly Batch mode Spark or MR jobs DASHBOARDS & APPLICATIONS can push data to Kinetica as MESSAGE STREAM BROKERS PROCESSING needed for fast query on data loaded from HDFS Parallel Ingestion HDFS ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° (Hadoop Distributed File System) ° ° ° ° ° ° ° ° ° 10
SIMPLIFY YOUR ARCHITECTURE No need to regularly recompute aggregates. • No need to load and manage a separate serving system or cache to make deep historical aggregates available to • your stream processing code. Aggregates are always up to date, as they are computed on demand; the latest events are always included • • Better performance with significantly reduced operational complexity, hardware footprint and cost.
STREAMING ANALYTICS, SIMPLIFIED ANALYSTS PUT, GET, SCAN Amazon Kinesis MOBILE ALERTING USERS SYSTEMS EVENTS Execute complex Kinetica analytics on the fly Connectors DASHBOARDS & APPLICATIONS MESSAGE STREAM BROKERS PROCESSING
CASE STUDY : LOCATION BASED ANALYTICS INTELLIGENCE: US Army - INSCOM US Army’s in-memory computational engine for U.S Army INSCOM Shift from Oracle to GPUdb any data with a geospatial or temporal attribute for a major joint cloud initiative within the Intelligence Community (IC ITE). Intel analysts are able to conduct near real-time analytics and fuse SIGINT, ISR, and GEOINT streaming big data feeds and visualize in a web browser. First time in history military analysts are able to query and visualize billions to trillions of near real- time objects in a production environment . Major executive military and congressional visibility. 42x Lower Space 28x Lower Cost GPUdb Oracle Spatial 38x Lower Power Cost (20ms) (92 Minutes) 1 GPUdb server vs 42 servers with Oracle 10gR2 (2011)
CASE STUDY : LOCATION BASED ANALYTICS LOGISTICS: Route optimization USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year. DISTRIBUTED ANALYSIS 15,000 simultaneous sessions AT SCALE 200,000 USPS devices emitting location each minute à 250+ million events captured and analyzed daily… ...... tracked on 10 nodes.
CASE STUDY : LOCATION BASED ANALYTICS PREDICTIVE INFRASTRUCTURE MANAGEMENT LARGE UTILITY COMPANY Kinetica operates as a speed-layer with ESRI to monitor, manage, and predict infrastructure health. 15
CASE STUDY : LOCATION BASED ANALYTICS LARGE LOGISTICS & FLEET MANAGEMENT RETAILER Kinetica enables agile tracking of shipments to assist store managers for tracking of inventory and arrival times. Visibility and tracking of deliveries & trucks for store • managers ETA & Notifications – Provide estimated time of delivery, • notifications and custom location based alerting Route Optimization based on truck size, and if cargo is • perishable or contains hazardous materials. 16
CASE STUDY : LOCATION BASED ANALYTICS PIPELINE & WELL ANALYTICS ENERGY RESEARCH Kinetica enables interactive query and geospatial visualization of large numbers of upstream and midstream assets. Complex joins across several tables with 300m • rows of data. Approx 100GB in size. Create custom visualizations, charts. • • Visualization of wells by land ownership, region, etc. 17
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS LIFE SCIENCES : GENOMICS RESEARCH GPU-acceleration on Kinetica enables processing of transcriptomics to run simulations for drug research. • Seeking out signals from massive collection of drug targets combined with historical data. • Accelerate simulations of chemical reactions. In-database processing to develop models, leveraging GPU acceleration for performance, and • direct access to CUDA APIs via UDFs deployed within Kinetica. " One of the things I like about Kinetica is it gives us more of a general-purpose use of the technology. There has been a lot of software created to answer certain questions [but] highly specialized tools have limited functionality and are tuned to do a certain workload. Mark Ramsey, Chief Data Officer at GSK 18
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS RISK MANAGEMENT MULTINATIONAL BANK Large financial institution moves counterparty risk analysis from overnight to real-time. Data collected by XVA library which computes risk • metrics for each trade Risk computations are becoming more complex and • computationally heavy. xVA analysis needs to project years into the future. Kinetica enables banks to move from batch/overnight • analysis to a streaming/real-time system for flexible real-time monitoring by traders, auditors and management. 19
Recommend
More recommend