how we run sql queries in memory when available memory is
play

How we run SQL queries in-memory when available memory is - PowerPoint PPT Presentation

How we run SQL queries in-memory when available memory is constrained with Kognitio analytical query streaming Roger Gaskell CEO Andrew Maclean - CTO 1 The problem with in-memory is there is never enough memory. 2 Who is Kognitio


  1. How we run SQL queries in-memory when available memory is constrained with Kognitio analytical query streaming Roger Gaskell – CEO Andrew Maclean - CTO 1

  2. The problem with in-memory is… …there is never enough memory. 2

  3. Who is Kognitio Originally founded in 1988 as White Cross Systems (later merged with Kognitio), focused on developing a database that could support high speed data analytics… …in a Shared nothing MPP (Massively Parallel Processing) …where data would be held in computer memory… 3

  4. Quick intro to Kognitio In-memory analytical Massively parallel Many deployment options platform processing • Provides ultra-fast high • Architected as scalable, • Standalone Linux compute concurrency SQL for big shared nothing, massively cluster or existing Hadoop data parallel processing cluster • Sophisticated support for • Data of interest held in- • On-premise or in the cloud embedding Non-SQL memory – queries programs in any language satisfied exclusively in memory • High concurrency, mixed work loads • Sits between where the data is stored and the data analysis tools and applications 4

  5. Architecture Application & client layer Queries Results Analytics Kognitio analytical platform layer Query coordinator Kognitio Processing Persistent memory images Persistence layer Hive tables / HDFS file system Local attached disk or NAS / Kognitio Linear File System External data sources Data warehouses and Data Other Hadoop Cloud storage legacy systems feeds clusters 5

  6. When is Kognitio used? • 0.5TB – 100TB • 100million – trillions of records Large data volumes • Conventional technologies struggling to provide the required performance • Client needs high-speed, interactive, ad-hoc analytics often using Need for speed visualization tools like Qlik, Tableau, PowerBI, Microstrategy • High query throughput – data as a service High concurrency, • Pervasive or Self-serve BI & analytics mixed workload • Data-as-a-service applications 6

  7. Never enough memory Available memory select c.region_name, count(*), sum(o.price) from customers c, orders o Work where c.id = o.customer_id Space group by 1 Data 7

  8. Early customer feedback “We love the speed but the ‘out of memory’ errors (when the system is busy or the query involves too much data) are very frustrating” 8

  9. Possible approaches Session 1 Session 2 Session 3 Session 4 Session 5 Session 6 Session 7 Session 8 Page to disk Statically divide Kognitio query workspace streaming • Very slow • Limits concurrency • Dynamic allocation of workspace • Can slow down queries even when • Inefficient use of workspace • Dynamic re-sizing as load changes there is plenty of work-space • Individual work-space can be exhausted • In-memory makes re-computation of • Requires available disk space while others are unused intermediate results very fast • Re-compute from raw data used to cope with constrained work-space • Never return out of memory errors 9

  10. Kognitio Query Streaming select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1 Customer table distributed on customer.id Conventional Plan Streaming Plan 10

  11. Kognitio Query Streaming select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1 Customer table NOT distributed on customer.id Conventional Plan Streaming Plan 11

  12. How this looks 12

  13. Each node optimising locally 13

  14. 14

  15. Example use case Clients pay to perform interactive ad-hoc retail analytics on billions of POS transactions Inmar Hadoop Cluster Kognitio on Hadoop SQL with embedded R processing Retail data pinned in memory data data in Hive ORC files 15

  16. Product Evolution 1990 – 1 st Gen 1996 – 2 nd Gen 2003 – 3rd Gen Software only In-memory Database Appliance In-memory Database Appliance Commodity Servers “Transputer” based “x86” based 16

  17. Hadoop is the only BI platform you need, with ultra-fast, high-concurrency SQL þ kognitio.com USA: +1 855 KOGNITIO UK: +44 1344 300770 linkedin.com/company/kognitio twitter.com/kognitio facebook.com/kognitio youtube.com/kognitio 17

Recommend


More recommend