Cypher for Apache Spark Graph processing workloads on OLAP and OLTP - PowerPoint PPT Presentation

Aug 23, 2022 •190 likes •281 views

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com Cypher for Apache Spark Apache Spark:

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com
Cypher for Apache Spark ● Apache Spark: computational platform (OLAP) ● Neo4j: transactional graph database (OLTP) ○ Query language: Cypher Wouldn't it be lovely to be able to execute a Spark job on a Neo4j graph? How do we integrate? What is a graph when it isn't in Neo4j anymore? ==> Cypher is the bridge! opencypher.org | opencypher@googlegroups.com
Schematic dataflow :Cypher :Cypher opencypher.org | opencypher@googlegroups.com
Example use case ● Graph of financial transactions ● Snapshot subgraph of transactions made during last month ● Do computationally heavy graph analytics on transaction patterns ○ Consume results as report (for humans) ○ Feed back results as new data to original graph ○ Deploy results as new graph ● Neo4j still operational for incoming transactions due to analytics off-loaded to Spark ● Fully integrated OLTP + OLAP opencypher.org | opencypher@googlegroups.com
Apache Spark -- overview / characteristics ● DataFrames are abstractions of tables ○ Based of RDD (Resilient Distributed Dataset) ○ SQL type system deployed in a non-type safe way (Scala code) ● SQL and API that compiles to lazily executed plans ○ Catalyst plan optimiser ● Distributed architecture for scalability opencypher.org | opencypher@googlegroups.com
Key developments ● Extend Cypher with the ability to return graphs ○ Cypher becomes closed over graphs ○ True compositionality of queries ● Modelling dynamic Cypher type system on strict table-based, SQL-aligned Spark DataFrames ○ Using DataFrames to make use of Catalyst optimiser ○ No support for type inheritance (compare Cypher's ANY type) opencypher.org | opencypher@googlegroups.com
Key developments -- type system ● Represent entities as flat maps ○ One column per property and label / rel type ○ Requires exact type information of all properties ➢ Acquired during import of graph ➢ Read-only setting allows immutable schema opencypher.org | opencypher@googlegroups.com
Key developments -- return graphs ● Interpret query results as a graph rather than table ○ Round-trip: graph to graph; can execute another query ○ No focus on syntax ● Pipeline of queries lazily evaluated on top of one another ○ Maximum utilisation of Catalyst to reorder operations ● Complementary API for injecting other operations in-between queries ○ Based on Spark DataFrame API opencypher.org | opencypher@googlegroups.com
Demo of prototype opencypher.org | opencypher@googlegroups.com

Recommend

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark: A Unified Engine for Big Data Processing Engine? Unified? Apache Spark: A Unified Engine for Big Data Processing PAGE 2 Apache Spark: A

499 views • 36 slides

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

Multiple graphs and composable queries in Cypher for Apache Spark Max Kieling openCypher

Multiple graphs and composable queries in Cypher for Apache Spark Max Kieling openCypher Implementers Meeting V Berlin, March 2019 Outline Cypher for Apache Spark (CAPS) overview Motivation Architecture Multiple Graphs

826 views • 33 slides

Cypher Knowledge Graphs slide 1 of 14 Cypher overview Cypher is a family of query languages for

Cypher Knowledge Graphs slide 1 of 14 Cypher overview Cypher is a family of query languages for Property Graphs: Proprietary query language of the Neo4j graph database Subset supported by other tools as well: openCypher Might be an

333 views • 18 slides

Cypher.PL Prolog Cypher Implementation SoCIM, 10th of May 2017 London Jan Posiadaa

Cypher.PL Prolog Cypher Implementation SoCIM, 10th of May 2017 London Jan Posiadaa janek@tiger.com.pl Prolog Implementation Cypher.PL Cypher implementation in SWI-Prolog: formal implementation... ...or rather executable

464 views • 15 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA Jet Propulsion Laboratory Agenda Data and Processing Data Systems Apache OODT Apache Spark Streaming OODT

725 views • 33 slides

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF What Is Apache CXF Production

465 views • 25 slides

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx Streaming Spark Dataframe Spark Core (RDD) 2 Machine Learning Algorithms Supervised learning Given a set of features and labels Builds a model that

590 views • 24 slides

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more

1.5k views • 52 slides

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Apache Spark Feb. 2, 2016 1 / 67 Big Data small data big data Amir H. Payberah (SICS) Apache Spark Feb. 2, 2016 2 / 67 Big Data

1.09k views • 86 slides

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI * Outline Review of Deep Learning Apache MXNet Framework Distributed Inference using MXNet and Spark Deep Learning Output CAR

652 views • 23 slides

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Nov / 14 / 16 Nick Pentreath Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning Author of Machine Learning

666 views • 53 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

848 views • 10 slides

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache So fu ware Foundation Apache Felix and Apache

725 views • 26 slides

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The Apache Way The Apache Way The Apache Way The Apache Way A collaborative slide deck with A collaborative slide deck with A collaborative slide deck

493 views • 45 slides

Peak Performance Remote Memory Revisited Hannes Mhleisen, Romulo Goncalves and Martin Kersten

Peak Performance Remote Memory Revisited Hannes Mhleisen, Romulo Goncalves and Martin Kersten Database Scalability Scale-Up Scale-Out (Big Iron) (Many Boxes) Cheating Full Virtualization Storage Clusters Remote Memory 2 Why

409 views • 17 slides

Module 3: Metadata Repository Understanding Analysis Cube Storage Options Client

Overview Microsoft Data Warehousing Overview Analysis Services Components Module 3: Metadata Repository Understanding Analysis Cube Storage Options Client Architecture Services Architecture Office 2000 OLAP Components

388 views • 5 slides

Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engines Perspective Marcus Pinnecke ,

Database and Software Engineering Group University of Magdeburg Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engines Perspective Marcus Pinnecke , David Broneske, Gabriel Campero Durand, Gunter Saake HardBD 2017, San Diego,

543 views • 20 slides

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos Agenda Agenda SQL Server Features (in short) SQL Server Features

161 views • 14 slides

Data Formats for Data Science Valerio Maggio Data Scientist and Researcher Fondazione Bruno

Data Formats for Data Science Valerio Maggio Data Scientist and Researcher Fondazione Bruno Kessler (FBK) Trento, Italy @leriomaggio About me kidding, thats me!-) Post Doc Researcher @ FBK Complex Data Analytics Unit (MPBA)

1.93k views • 54 slides

: Streaming Meets Transaction Processing By Meehan et al. CS590-BDS Thamir Qadah Some slides

: Streaming Meets Transaction Processing By Meehan et al. CS590-BDS Thamir Qadah Some slides contains material from the original authors slides. Project Website: http://sstore.cs.brown.edu/ Introduction What is S-Store? A data

914 views • 44 slides

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #7:

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #7: LARGER-THAN-MEMORY DATABASES 2 THE WORLD OF DATABASE SYSTEMS CitusData Distributed extension of a single-node DBMS (PostgreSQL) 3 ADMINISTRIVIA

492 views • 32 slides

More Than A Network: Distributed OLTP on Clusters of Hardware Islands Danica Porobic , Pnar

More Than A Network: Distributed OLTP on Clusters of Hardware Islands Danica Porobic , Pnar Tzn, Raja Appuswamy, Anastasia Ailamaki 1 Multisocket multicores 21 cycles 72 cycles 2x12-core 237 cycles Intel Xeon E5-2650L v3 threads

554 views • 14 slides