How we run SQL queries in-memory when available memory is - PowerPoint PPT Presentation

How we run SQL queries in-memory when available memory is constrained with Kognitio analytical query streaming Roger Gaskell – CEO Andrew Maclean - CTO 1

The problem with in-memory is… …there is never enough memory. 2

Who is Kognitio Originally founded in 1988 as White Cross Systems (later merged with Kognitio), focused on developing a database that could support high speed data analytics… …in a Shared nothing MPP (Massively Parallel Processing) …where data would be held in computer memory… 3

Quick intro to Kognitio In-memory analytical Massively parallel Many deployment options platform processing • Provides ultra-fast high • Architected as scalable, • Standalone Linux compute concurrency SQL for big shared nothing, massively cluster or existing Hadoop data parallel processing cluster • Sophisticated support for • Data of interest held in- • On-premise or in the cloud embedding Non-SQL memory – queries programs in any language satisfied exclusively in memory • High concurrency, mixed work loads • Sits between where the data is stored and the data analysis tools and applications 4

Architecture Application & client layer Queries Results Analytics Kognitio analytical platform layer Query coordinator Kognitio Processing Persistent memory images Persistence layer Hive tables / HDFS file system Local attached disk or NAS / Kognitio Linear File System External data sources Data warehouses and Data Other Hadoop Cloud storage legacy systems feeds clusters 5

When is Kognitio used? • 0.5TB – 100TB • 100million – trillions of records Large data volumes • Conventional technologies struggling to provide the required performance • Client needs high-speed, interactive, ad-hoc analytics often using Need for speed visualization tools like Qlik, Tableau, PowerBI, Microstrategy • High query throughput – data as a service High concurrency, • Pervasive or Self-serve BI & analytics mixed workload • Data-as-a-service applications 6

Never enough memory Available memory select c.region_name, count(*), sum(o.price) from customers c, orders o Work where c.id = o.customer_id Space group by 1 Data 7

Early customer feedback “We love the speed but the ‘out of memory’ errors (when the system is busy or the query involves too much data) are very frustrating” 8

Possible approaches Session 1 Session 2 Session 3 Session 4 Session 5 Session 6 Session 7 Session 8 Page to disk Statically divide Kognitio query workspace streaming • Very slow • Limits concurrency • Dynamic allocation of workspace • Can slow down queries even when • Inefficient use of workspace • Dynamic re-sizing as load changes there is plenty of work-space • Individual work-space can be exhausted • In-memory makes re-computation of • Requires available disk space while others are unused intermediate results very fast • Re-compute from raw data used to cope with constrained work-space • Never return out of memory errors 9

Kognitio Query Streaming select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1 Customer table distributed on customer.id Conventional Plan Streaming Plan 10

Kognitio Query Streaming select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1 Customer table NOT distributed on customer.id Conventional Plan Streaming Plan 11

How this looks 12

Each node optimising locally 13

Example use case Clients pay to perform interactive ad-hoc retail analytics on billions of POS transactions Inmar Hadoop Cluster Kognitio on Hadoop SQL with embedded R processing Retail data pinned in memory data data in Hive ORC files 15

Product Evolution 1990 – 1 st Gen 1996 – 2 nd Gen 2003 – 3rd Gen Software only In-memory Database Appliance In-memory Database Appliance Commodity Servers “Transputer” based “x86” based 16

Hadoop is the only BI platform you need, with ultra-fast, high-concurrency SQL þ kognitio.com USA: +1 855 KOGNITIO UK: +44 1344 300770 linkedin.com/company/kognitio twitter.com/kognitio facebook.com/kognitio youtube.com/kognitio 17

How we run SQL queries in-memory when available memory is - PowerPoint PPT Presentation

How we run SQL queries in-memory when available memory is constrained with Kognitio analytical query streaming Roger Gaskell CEO Andrew Maclean - CTO 1 The problem with in-memory is there is never enough memory. 2 Who is Kognitio

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

SQL SQL SQL = Structured Query Language Standard query language for relational

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Advanced SQL 01 The Core of SQL Torsten Grust Universitt Tbingen, Germany 1 The Core

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Introductjon to SQL Part 1 Single-Table Queries By Michael Hahsler based on slides for CS145

Simple SQL Queries (2) Review SQL the structured query language for relational databases

Real SQL Programming 1 SQL in Real Programs We have seen only how SQL is used at the generic

Query Processing in a Self-Organized Storage System Hannes Mhleisen, supervised by Robert

Query Optimization Lecture # 13 Database Systems Andy Pavlo AP AP Computer Science

15-721 ADVANCED DATABASE SYSTEMS Lecture #20 Query Compilation Andy Pavlo / / Carnegie

Lecture 2: Overview 1. Discussion of Joes Garage 2. Lean Terminology in Joes Garage 3.

From relation algebra to semi-join algebra: an approach for graph query optimization Jelle

Parametric Query Optimization for Linear and Piecewise Linear Cost Functions Arvind Hulgeri S.

PostgreSQL:,N ode.js,Client 1 Read%from%PostgreSQL%with%Node.js //include the node postgres

Semantics-Aware Prediction for Analytic Queries in MapReduce Environment Weikuan Yu, Zhuo Liu,