GraphGen : Adaptive Graph Processing using Relational Databases - PowerPoint PPT Presentation

GraphGen : Adaptive Graph Processing using Relational Databases Department of Computer Science University of Maryland

Graph Analytics / Querying Graph datasets can provide value in many domains Protein Interaction Email Networks Social Networks Stock Trading Networks Networks Many different types of ways to manage graph data Graph Databases (neo4j, orientDB, RDF stores) ● Distributed Batch Analytics systems (Giraph, GraphX, GraphLab) ● In-Memory systems (Ligra, Green-Marl, X-Stream) ● Many research prototypes / custom indexes. ●

RDBMS-based Graph Systems vs GraphGen DECLARATIVE

Example: TPC-H Which customers bought LineItem the same item? Customer order_key part_key c_key name o1 p1 c_1 John o1 p2 Orders LineItem c_2 Jane o2 p1 c_key p_key cust1 cust2 o2 p3 c1 p1 c1 c4 o3 p1 c1 p2 o3 p2 c1 c6 c3 p2 On order_key On p_key o3 p2 c1 c3 c4 p1 c4 c6 c6 p1 Orders order_key customer_key c1 c4 o1 c1 o2 c2 Which customer bought o3 c3 which product? c6 c3

Example: TPC-H Which customers bought LineItem the same item? Customer order_ke part_ke c_ke name y y y o1 p1 c_1 John o1 p2 Orders LineItem c_2 Jane Many other graphs of potential interest : o2 p1 c_key p_key ● Suppliers that sell a common item cust1 cust2 o2 p3 c1 p1 ● Employees working under the same manager c1 c4 o3 p1 c1 p2 ● Parts that were ordered together o3 p2 c1 c6 c3 p2 On order_key ● Bipartite graph between Part and Supplier On p_key o3 p2 c1 c3 c4 p1 ● ... c4 c6 c6 p1 Orders order_key customer_key c1 c4 o1 c1 o2 c2 Which customer bought o3 c3 which product? c6 c3

GraphGen Directly over Vertex- Graph Centric Results Java Program Graph Definition Graph Analysis Direct Graph Queries Queries Access In-Memory Engine DSL Parser + Optimizer GraphGen SQL Queries Backend Relational DBMS

GraphGenDL - Definition Language Definition of a GraphView over the database ● User specifies how to construct the Nodes and Edges ○ CREATE GRAPHVIEW CoAuthors AS Nodes (ID, name) :- Author(ID, name). Edges (ID1, ID2, wt= $COUNT (pub)) :- AuthorPub(ID1, pub), AuthorPub(ID2, pub). Edge Property : number of publications Definition of a collection of graphs ( Multi-Graph View ) over the database ● Extract all Can enable many optimizations ○ ego-graphs CREATE GRAPHVIEW AuthorEgoNetworks(X) WHERE Author(X) AS Nodes (X, name) :- Author(X, name). Nodes (ID, name) :- AuthorPub(X,pub), AuthorPub(ID,pub), Author(ID, name). Edges (ID1, ID2) :- AuthorPub(ID1, pub), AuthorPub(ID2, pub).

GraphGenQL - Query Language ● Specifying Graph Queries over GraphViews ● Support for subgraph pattern matching languages like SPARQL, Cypher, PGQL etc. ● Datalog is a natural fit for expressing recursive computation over the Edges VIEW Find triangles of authors whose areas follow: “ML” -> “DB” -> “AL” USING GRAPHVIEW CoAuthors Triangle(X, Y, Z) :- Nodes (X, _, “ ML ” ), Nodes (Y, _, “ DB ” ), Nodes (Z, _, “ AL ” ), Edges (X, Y), Edges (Y, Z), Edges (X, Z).

GraphGen Directly over Vertex- Graph Centric Results Java Program Graph Definition Graph Analysis Direct Graph Queries Queries Access In-Memory Engine DSL Parser + Optimizer GraphGen SQL Queries Backend Relational DBMS

GraphGen Directly over Vertex- Graph Centric Results Java Program Graph Definition Graph Analysis Goal : We want to adapt the execution Direct Graph Queries ● Queries Access based on the query/analysis. What are some of the challenges here?? In-Memory Engine ● DSL Parser + Optimizer GraphGen SQL Queries Backend Relational DBMS

1. Where to execute Queries/ Tasks Dataset DBS1 DBS2 ● Depends on workload , rate of updates , rate of queries … Small 0.899 s 0.22 In-memory execution Dataset In-memory ETL MySQL PosgreSQL Large 4.25 s NA Small 0.001 s 2.05 s 0.8 s 0.1 s Large 0.015 s 17.52 s 4.26 0.704 s Triangle Pattern Matching Key Challenge: Develop accurate cost models, tools, ● techniques. Decide what to compute where In-database execution Other issues: Large-output joins [SIGMOD ‘17] , and selectivity ● estimation errors associated with them.

2. Query Rewriting Assume the execution is to be pushed to the database ● Many different ways to construct equivalent SQL queries ● Auto-generated SQL can be verbose → Challenging to optimize ● 1) With vs VIEW 2) Duplicate Elimination ( DISTINCT ) DISTINCT With Nodes as (...) Create View Edges as (...) With Edges as (...) Create View Nodes as (...) (SQL for answering query) (SQL for answering query) DISTINCT The costly duplicate removal might even be unnecessary if ● the query / analysis doesn’t care about them!

2. Query Rewriting Assume the execution is to be pushed to the database ● Many different ways to construct equivalent SQL queries ● Auto-generated SQL can be verbose → Challenging to optimize ● 1) With vs VIEW 2) Duplicate Elimination ( DISTINCT ) DISTINCT With Nodes as (...) Create View Edges as (...) With Edges as (...) Create View Nodes as (...) (SQL for answering query) (SQL for answering query) Time for query to finish in seconds DISTINCT The costly duplicate removal might even be unnecessary if ● the query / analysis doesn’t care about them!

3. Optimizing Multi-Graph Views Ego Graph Analysis, Graph snapshot analysis ● Ability to refer to each graph independently → significant ● savings Opportunity: Overlap computation and storage over ● collections of graphs Snapshots Key Challenge : Develop a systematic approach to optimizing the extraction of and execution against such multi-graph CREATE GRAPHVIEW CoAuthorsSnapshot( X ) WHERE X IN RANGE (1950 , 2017 , 1) Nodes (ID,name) :- Author(ID,name). views. Edges (ID1,ID2) :- AuthorPub(ID1, pub), AuthorPub(ID2, pub), Publication(pub, _, Y), Y <= X. Please see E.g. Ego-Graph Analysis full paper Naive : Generate a separate SQL query for each distinct graph. ● Result-Tagging: We can extract all graphs with a single query! ●

Find the edges 1-hop Result-Tagging away for the source (tag) & Union the result with the initial Tagged Edges table Tagged Edges Table e1. aid2 = e2. aid1 Tags show which ego-graphs involve the edge aid1 aid2 tag aid1 aid2 tag a1 a2 a1 a2 a3 a1 a1 a5 a1 a5 a3 a1 aid1 aid2 tags[] a1 a6 a1 a6 a7 a1 a1 a2 [a1] a6 a7 a6 Tag a7 a8 a6 a1 a5 [a1] Aggregation a7 a8 a7 a3 a4 a5 a1 a6 [a1] a5 a3 a5 a3 a4 a2 a6 a7 [a1,a6] a3 a4 a3 a1 a2 a1 a7 a8 [a6,a7] a2 a3 a2 a1 a5 a1 a5 a3 [a5,a1] a1 a6 a1 a3 a4 [a2,a3,a5] a6 a7 a6 a2 a3 [a1,a2] a7 a8 a7 a5 a3 a5 a3 a4 a3 a2 a3 a2

Thank you! Take Aways Questions? Need for a unified framework for extraction and analysis of ● graphs stored implicitly in a structured data store. We propose declarative a Datalog-based DSL for specifying: ● GraphViews over relational schemas ○ Declarative Graph queries ○ Expose a series of APIs for defining complex graph analytics over ● GraphViews There is a variety of challenges & opportunities here in terms of: Deciding where to execute graph queries ● Handling large-output joins and inaccuracies of query optimizers ● Rewriting SQL queries pushed to the database ● Optimizing across collections of graphs ( Multi-Graph Views ) ●

GraphGen : Adaptive Graph Processing using Relational Databases - PowerPoint PPT Presentation

GraphGen : Adaptive Graph Processing using Relational Databases Department of Computer Science University of Maryland Graph Analytics / Querying Graph datasets can provide value in many domains Protein Interaction Email Networks Social

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

This Lecture The Relational Model Relational data structures Relations and Relational

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Relational Non-Relational Rational Agile Predictable Flexible Traditional

FM Translators for AM stations We have the license, now what? (and other translator tips, tricks

Deep Feedforwards Networks Amir H. Payberah payberah@kth.se 28/11/2018 The Course Web Page

CIMMYT CAGE meeting CIMMYT CAGE meeting Update : Identification and utilization of novel sources

Presentation to 13 th October 2015, Brussels, Belgium 13/10/2015 Innovation Finance Advisory |

Why Burgers Equation: What Are the . . . Can Burgers Equation . . . Symmetry-Based Approach

Approximate Analysis to the KdV-Burgers Equation Zhaosheng Feng Department of Mathematics

Replaying and Isolating Failing Multi-Object Interactions Martin Burger Andreas Zeller

Outline Notes: Scalar nonlinear conservation laws Shocks and rarefaction waves Entropy