Getting Started With Apache Ignite SQL Denis Magda, GridGain - PowerPoint PPT Presentation

Getting Started With Apache Ignite SQL Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group

Topics • Ignite SQL Basics: DML, DDL, connectivity, configuration • Affinity Co-Location and Distributed JOINs • Beyond Memory Capacity: Disk Tier Usage and Memory Quotas • Ignite SQL Evolution With Apache Calcite

Ignite SQL Basics

Ignite SQL = ANSI SQL at Scale • ANSI-99 DML and DDL syntax – SELECT, UPDATE, CREATE… • Distributed joins, grouping, sorting • Schema changes in runtime – ALTER TABLE, CREATE/DROP INDEX • Works with in-memory and disk-only records – If Ignite Persistence is used as a disk tier

Connectivity Options • Thick Client APIs – Java, C#/.NET, C++ • JDBC and ODBC drivers • Thin Client APIs – Multi-language support

Configuration Option #1: Programmatically With Annotations Usage Scenario: • Spring-style development by annotating POJOs • DDL can be used to apply changes in runtime.

Configuration Option #2: Spring XML With Query Entities Usage Scenario: • Ignite as a cache that writes-through changes to an external database. • DDL can be used to apply changes in runtime.

Configuration Option #3: In Pure SQL With DDL Usage Scenario: • SQL-driven applications • Green-field applications using Ignite as a database with its native persistence

Demo Time Cluster Startup and Database Creation

Affinity Co-Location and Distributed JOINs

Ignite SQL Engine Internals Ignite SQL Ignite SQL H2 Engine H2 Engine Data & Indexes Data & Indexes Ignite SQL H2 Engine Data & Indexes

Query Execution Phases Map City Reduce Map Map Thick Client City Reduce

Default Data Distribution France Canada Marseille Toronto Montreal Paris Ottawa Calgary City Country Table Table

SQL JOIN With Data Shuffling 2 Canada Toronto Paris Calgary Paris 3 Ottawa Thick Client Montreal 2 France 1 & 4 Marseille Ottawa 1. Initiating Execution Montreal 2. Execution on Servers (map phase) 3. Data Shuffling 4. Reduce Phase

Co-Located Distribution (aka. Affinity Co-Location) France Canada Toronto Marseille Montreal Paris Ottawa Calgary City Country Table Table

All You Need is to Configure Affinity Key

Affinity Key to Node Mapping Process Application Process Network Call City Record Affinity Key Partition Node

High-Performance SQL JOIN 2 Canada Toronto Ottawa Calgary Thick Client 2 France 1 & 3 Marseille 1. Initiating Execution Paris 2. Execution on Servers (map phase) 3. Reduce Phase

Demo Time Queries With JOINs

Beyond Memory Capacity: Disk-Tier and Memory Quotas

Multi-Tier Storage architecture 1. In-Memory - General in-memory caching, high-performance computing 2. In-Memory + Native Persistence - Ignite as an in-memory database 3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability

Multi-Tier Storage Architecture Memory segment Index Page Data page Leaf page Metadata page Leaf page Data page Metadata page (root) Data page Index Page (root) Key-Value Index page Key-Value Inner page 2 (inner) Key-Value Index Page Leaf page 2 Leaf page 3 (leaf) Key-Value Index Data

Multi-Tier Storage Architecture Memory segment Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page Data page #0 Data page #1 Data page #2 Data page #3 Data page #4 Data page #5 Data page #6 Partition file with Data Pointer in a memory segment Pages map (read/write ops) PageId Position in a file (load page/checkpoint)

Java off-heap vs Java heap Heap Planning Parsing Heap SQL query processing Scanning (index or table scan) Computing Off-heap (filters, joins, expressions) More Heap

Java off-heap vs Java heap τ max_pop Sorting Here we need full set in heap ρ name, name0, max_pop Renaming country.name, city.name ℱ MAX(city.population) Aggregation π country.name, city.name, city.population Projection Here we need full set in heap too ⋈ country.code = city.countrycode Join σ code in (‘CAN’, ‘FRA’) Filtering CITY COUNTRY Scanning

Query memory quotas How to configure:

Interim results offloading τ max_pop Why don’t you flush result Sorting sets to disk? ρ name, name0, max_pop Renaming And it country.name, city.name ℱ MAX(city.population) Aggregation π country.name, city.name, city.population Projection ⋈ country.code = city.countrycode Join σ code in (‘CAN’, ‘FRA’) Filtering CITY COUNTRY Scanning

Intermediate results offloading How to configure:

When you need quotas/offloading enabled ● Sorting (ORDER BY) ● Grouping (DISTINCT, GROUP BY) ● Complex subqueries

Demo Time Running SQL Over Disk-Only Records

Apache Ignite SQL Evolution With Apache Calcite

Why do we need it? Here we need Map-Reduce phase too Here we need Map-Reduce phase

Typical execution flow Result Schema User SQL Query Validation Validator Parser Dictionary Query AST Statistics RBO CBO Rule-based Cost-based Optimizer mode optimizer optimizer Query plan Row source generator Execution

Apache Calcite JDBC Client JDBC Server Metadata SPI Optional SQL Parser/Validator Need to implement: Core Query optimizer ● Splitter ● Runtime Pluggable 3rd party ops 3rd party ops Plugable ● Indexes support rules ● DML support ● DDL support 3rd party 3rd party data data

Query Parser and Transformer Relational operators tree (query plan) π p.id, p.name τ p.name DESC Query AST Select expr= p.id, d.name σ p.id > 10 OR p.id < 10000 from= person, dep ⋈ p.depId = d.id cond= p.depId = d.id AND (p.id > 10 OR p.id < 10000) person(p) dep(d) order = p.name DESC offset= 10

Cost-Based Optimizer Query AST (from parser) Query transformer Relational tree Statistics Dictionary Estimator Equivalent Relational tree relational + costs tree Plan generator Rules Query plan (to Row source generator)

Cost-Based Splitter Root π p.id, p.name τ p.name DESC σ p.id > 10 OR p.id < 10000 ⋈ p.depId = d.id person(p) dep(d)

Reactive Execution Flow Network communication Push Push Send Push Data flow Scan Filter Sender Receiver Client cursor Backpressure Request Request Acknowledge Request Node buffer Node buffer Node buffer Node buffer

Demo Time Calcite Prototype Demo With Sub-Queries

Learn More • Apache Ignite SQL https://apacheignite-sql.readme.io/docs – • Memory Quotas (available in GridGain Community Edition): https://www.gridgain.com/docs/latest/developers-guide/me – mory-configuration/memory-quotas • Demos shown in this webinar https://github.com/GridGain-Demos/ignite-sql-intro-samples – • New Apache Calcite-based engine https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+ – query+execution+engine

Getting Started With Apache Ignite SQL Denis Magda, GridGain - PowerPoint PPT Presentation

Getting Started With Apache Ignite SQL Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group Topics Ignite SQL Basics: DML, DDL, connectivity, configuration Affinity Co-Location and Distributed JOINs

Ingesting Streaming Data for Analysis in Apache Ignite Pat Patterson StreamSets

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Ultimate Guide to Successful Cross-Platform Deployments with Apache Ignite Pavel Petroshenko

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

SQL SQL SQL = Structured Query Language Standard query language for relational

Using Distributed Tracing to Resolve Performance Issues in Apache Ignite Greg Stachnick, Director

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Ignite as MPP Accelerator Alexander Ermakov, CTO Agenda About us Why do

Apache Ignite Extensions - Modularization Saikat Maitra Twitter @samaitra Github samaitra

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Rectified Linear Score Normalization and Weighted Integration for Ad-hoc Video Search Y. Yan,

First passage fluctuation relations ruled by cycles affinities F. Cornu joint work with M.

Continuous Improvement Solving Problems That Change Lives CI Skills Development Problem Solving

Neu Neural C al Collab ollabor orativ ive e Subspace ce Clustering Tong Zhang, Pan Ji ,

Relax and Let the Database do the Partitioning Online Alekh Jindal, Jens Dittrich - presented by

AFFINITY GROUP TO EXTEND AND DEEPEN ANTI-RACISM WORK Presented at The 8 th Annual CNM Conference

Sterimol parameters [4]. A satisfactory MLR (multiple linear regression) equation was obtained (r

ACDIS Partnership Opportunities If you are experiencing any technical difficulties, please