Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov
Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache Crunch PMC member ctzolov@pivotal.io blog.tzolov.net twitter: @christzolov https://nl.linkedin.com/in/tzolov Disclaimer This talk expresses my personal opinions. It is not read or approved by Pivotal and does not necessarily reflect the views and opinions of Pivotal nor does it constitute any official communication of Pivotal. Pivotal does not support any of the code shared here. 2
Big Data Landscape 2016 • Volume • Velocity • Varity • Scalability • Latency • Consistency vs. Availability (CAP) 3
Data Access • {Old | New} SQL • Custom APIs – Key / Value – Fluent APIs – REST APIs • {My} Query Language Unified Data Access? At What Cost? 4
SQL? • Apache Apex • SQL-Gremlin • Apache Drill … • Apache Flink • Apache Geode • Apache Hive • Apache Kylin • Apache Phoenix • Apache Samza • Apache Storm • Cascading • Qubole Quark 5
Geode Adapter - Overview SQL SQL/JDBC/ JDBC/ODBC ODBC Parse S SQL, co conv nverts i int nto relationa nal e expression a n and nd Apache Calcite optimizes Push d down t n the r relationa nal expressions ns s supported b by G Geode OQL a and nd f falls b back ck t to t the C Calci cite Spring ng D Data A API f for Enumerable Enu numerable A Adapter f for t the r rest int nteract cting ng w with G Geode Adapter Conv nvert S SQL r relationa nal expressions ns i int nto O OQL q queries Spring Data Geode Adapter Geode (Geode Client) Geode A API a and nd O OQL Geode Server Geode Server Geode Server Data Data Data
SQL Relational Expressions SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber” WHERE b."totalPrice" > 0; (c.firstName, b.totalPrice) Project (c.firstName, b.totalPrice) Project (on customerNumber) optimize Join (b.totalPrice > 0) Filter (totalPrice, Project customerNumber) Join (on customerNumber) (firstName, Project Filter (totalPrice > 0) customerNumber) Scan Scan Scan Scan Customer [c] BookOrder [b] BookOrder [b] Customer [c] 7
Geode Push Down Candidates Relational Operator Geode Support LIMIT YES (without OFFSET) PROJECT YES FILTER YES JOIN For collocated Regions only AGGREGATE YES for GROUP BY, DISTINCT, MAX, MIN, SUM, AVG, COUNT http://bit.ly/2eKApd0 SORT YES 8
Apache Geode? “ … in-memory, distributed database with strong consistency built to support low latency transactional applications at extreme scale”
Why Apache Geode? China Railway 5,700 train stations 7,000 stations 4.5 million tickets per day 72,000 miles of track 20 million daily users 23 million passengers daily 1.4 billion page views per day 120,000 concurrent users 40,000 visits per second 10,000 transactions per minute https://pivotal.io/big-data/case-study/distributed-in-memory-data-management-solution https://pivotal.io/big-data/case-study/scaling-online-sales-for-the-largest-railway-in-the-world-china-railway-corporation 10
Apache Geode Features • In-Memory Data Storage • Streaming and Event Processing – Over 100TB Memory – Listeners – JVM Heap + Off Heap – Distributed Functions – Continuous OQL Queries • Any Data Format • Multi-site / Inter-cluster – Key-Value/Object Store • ACID and JTA Compliant • Full Text Search (Lucene indexes) Transactions • Embedded and Standalone • HA and Linear Scalability • Top Level Apache Project • Strong Consistency 11
Apache Geode Concepts Loca cator – tracks system Client nt –read and modify the members and provides content of the distributed membership information system Locator (member) Client (member) Listene ner – event handler. Cach cheServer – proce cess Registers for one or connected to the distributed more events and notified Cache Server (member) system with created Cach che when they occur Cache Listeners Region 1 Ke Val Region - n - consistent, di distr stributed uted y k1 v1 Ma Map (key-value), Functions k2 v2 Partitioned or Replicated Funct nctions ns – distributed, … … concurrent data Cach che - In-memory collection processing of Regi Regions Region N
Geode Topology Cache Server Cache Server Cache Server Peer-to-Peer Cache Data Cache Data Cache Data Client Local Cache Client-Server pool Cache Server Cache Server Cache Server Cache Data Cache Data Cache Data … Cache Server Cache Server Cache Server Cache Server Cache Data Cache Data Cache Data Cache Data Gateway Receiver Gateway Sender Multi-Site WAN Multi-site Boundary … Gateway Sender Gateway Receiver Cache Data Cache Data Cache Data Cache Data Cache Server Cache Server Cache Server Cache Server
Geode Client API • Client Cache • Key / Value - Region GET, PUT, REMOVE • OQL – QueryService
Geode Data Types & Serialization • Key-Value with complex value formats • P ortable D ata e X change (PDX) Serialization – Delta propagation, schema evolution, polyglot support … • O bject Q uery L anguage (OQL) { SELECT p.name id: 1, FROM /Person p name: “Fred”, WHERE p.pet.type = “ dino ” age: 42, nested fields pet: { name: “Barney”, single field deserialization type: “ dino ” } }
Geode Demo (GFSH and OQL) • Connect to Geode cluster, • List available Regions • Run OQL query
Apache Calcite? Java framework that allows SQL interface and advanced query optimization, for virtually any data system • Query Parser, Validator and Optimizer(s) • JDBC drivers - local and remote • SQL Streaming • Agnostic to data storage and processing • SQL completes vs. NoSQL integrity
Calcite Data Types JDBC • Catalog – namespaces accessed in queries • Schema - collection of schemas and tables Calcite SQL Engine • Table - single data set, collection of rows Schema • RelDataType – SQL fields types in a Table … Table Table Table Data Type SELECT title, author FROM test.BookMaster Mapping Data System Data Types Data Type Fields Schema Table Your Data System
Calcite Data Types: RelDataType Type of a scalar expression or row • RelDataTypeFactory – RelDataType factory • JavaTypeFactory - registers Java classes as record types • JavaTypeFactoryImpl - Java Reflection to build RelDataTypes • SqlTypeFactoryImpl - Default implementation with all SQL types 19
Geode to Calcite Data Types Mapping Geode Cache is mapped into Calcite Schema Create Column Types (RelDataType) from Geode Calcite Schema Geode Cache Value class (JavaTypeFact ctoryImpl) Region 1 Table 1 Col1 Col2 ColN Row1 V(1,1) V(1,2) V(1,N) Key Val Row2 V(2,1) V(2,2) V(2,N) k1 v1 RowM V(M,1) V(M,2) V(M,N) k2 v2 Geode Key/Value is mapped … … into Table Row Region K Table K Regions are mapped into Tables 20
Calcite Bootstrap Flow Typical calcite initialization flow Conf nfigures C Calci cite Model (JSON) Creates SchemaFactory Creates Schema Creates Tables 21
Calcite Model Model The p path t to < <my-model>.json json i is p passed a as J JDBC co conne nnect ction a n argument nt: : SchemaFactory Schema !connect jdbc:calcite:model=target/test-classes/<my-model-path>.json ︎ Tables { version: '1.0', defaultSchema: 'TEST', Reference to your adapter schemas: [ { Schema Name schema factory implementation name: ' TEST ', type: 'custom', class factory : 'org.apache.calcite.adapter.geode.simple. GeodeSchemaFactory ', operand : { locatorHost: 'localhost', Parameters to be passed to locatorPort: '10334', your adapter schema factory regions: ' BookMaster ', implementation pdxSerializablePackagePath: 'net.tzolov.geode.bookstore.domain.*' } }] }
Geode Calcite Schema and Schema Factory Model public class GeodeSchemaFactory implements SchemaFactory { SchemaFactory public Schema create (SchemaPlus parentSchema, String schemaName, Map<String, Object> operand ) { Schema Retrieves the parameters set in String locatorHost = (String) operand .get( “ locatorHost”); Tables int locatorPort = … the model.json String[] regionNames = … String pdxPackagePath = … Create an Adapter Schema instance with the provided return new GeodeSchema (locatorHost, locatorPort, regionNames, pdxPackagePath); parameters. } } public class GeodeSchema extends AbstractSchema { private String regionName = .. protected Map<String, Table> getTableMap () { Create GeodeScannableTable final ImmutableMap.Builder<String, Table> builder = ImmutableMap. builder (); instance for each Geode Region Region region = … Get Geode Region by region name … Class valueClass= … Find region’s value type … builder.put(regionName, new GeodeScannableTable (regionName, valueClass , clientCache)); return tableMap; }
Recommend
More recommend