Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker - PowerPoint PPT Presentation

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT

Jena FlockDB AllegeroGraph TAO Pegasus Neo4j DEX HypergraphDB Pregel Titan GraphBase Twister Giraph Trinity HaLoop GraphLab PrItr

Can we have efficient graph analytics within a SQL Database?

Graph Analytics on SQL Databases • Graph: set of nodes, set of edges • Node table: nodeId and associated metadata • Edge table: (from,to) nodeIds and associated metadata • Undirected graph: two tuples per edge • Node/Edge access: selection, projection on node and edge tables • Graph traversal: successive joins between node and edge tables

Optimizations • Number of Joins   Parallel graph exploration • Number of queries; round trips   Nested queries; database handles the optimizations • Data movement between server and client   UDFs, Stored Procedures • Database connections   Keep connections alive between iterations • SQL query performance   Sort orders, indexes

How does the performance look like?

PageRank 10000 Graph Database SQL: Main-memory Store SQL: Row Store 1000 SQL: Column Store Time (seconds) 100 10 1 Facebook Twitter GPlus LiveJournal Nodes 4K 76K 107K 4.8M Edges 88K 2.4M 30M 69M

Shortest Paths 100000 Graph Database 18,702.2 SQL: Main-memory Store 10000 SQL: Row Store SQL: Column Store Time (seconds) 1,231.7 1000 428.4 395.6 212.7 168.1 100 21.3 20.2 8.7 10 4.7 4.4 3.2 1 Facebook Twitter GPlus LiveJournal Nodes 4K 76K 107K 4.8M Edges 88K 2.4M 30M 69M

What about the query interface? Is SQL the right choice for graph queries?

Shortest Path in SQL UPDATE NNodes AS nnode SET Estimate = new_nnode.Estimate, Predecessor = new_nnode.Predecessor FROM ( SELECT temp.Id, temp.Estimate, edge.from_node AS Predecessor FROM NNodes AS nn, edge, ( SELECT e.to_node AS Id, min(n1.Estimate+1) AS Estimate FROM NNodes AS n1, edge AS e, NNodes AS n2 WHERE n1.Id=e.from_node AND n2.Id=e.to_node GROUP BY e.to_node, n2.Estimate Tables !!! HAVING min(n1.Estimate+1) < n2.Estimate ) AS temp WHERE nn.Id=edge.from_node AND edge.to_node=temp.Id AND nn.estimate=temp.estimate-1 ) AS new_nnode WHERE nnode.Id = new_nnode.Id;

Shortest Path in Pregel void compute(vector<vfloat> messages){ � // get the minimum distance vfloat mindist = id==START_NODE ? 0 : DBL_MAX; for(vector<vfloat>::iterator it = messages.begin(); it != messages.end(); ++it) mindist = min(mindist,*it); � // send messages to all edges if new minimum is found vfloat vvalue = getVertexValue(); Graph !!! if(mindist < vvalue){ modifyVertexValue(mindist); vector<vint> edges = getOutEdges(); for(vector<vint>::iterator it = edges.begin(); it != edges.end(); ++it) sendMessage(*it, mindist+1); } � // halt voteToHalt(); }

What about having a vertex-centric interface in a SQL Database?

Vertex-centric Interface in SQL Databases • Idea: Map vertex-centric program execution to SQL queries in a SQL database • The programmer specifies what happens on each vertex in the graph • Vertices are executed as long as they are in active state or if they have an incoming message • Exact same API as in Pregel, e.g. getting incoming messages, vertex value, vertex edges, etc.

Implementation Details • Vertex (V), Edge (E), Message (M) • The vertex programs are executed in parallel (super step) as UDFs in the SQL database • Vertex Input: (V,E,M) for the vertex • Vertex Output: outgoing M from the vertex • A coordinator synchronizes between super steps, i.e. redistributes the messages from one super step to the next • The coordinator stops when there are no more messages

Optimizations • 3-way join, instead of 2-way   Table unions in place of joins • UDF call overheads   Batching several vertices in each UDF call • Too many new messages in each super step   Replace messages table, no in-place updates • SQL query performance   Sort orders, indexes

How does the performance look like?

Shortest Paths 1000 SQL: Column Store 439.8 Vertex-centric: Column Store 212.7 Vertex-centric: Apache Giraph 100.9 Time (seconds) 92.5 100 47.0 35.0 28.2 21.3 20.2 18.4 10 5.7 4.4 1 Facebook Twitter GPlus LiveJournal Nodes 4K 76K 107K 4.8M Edges 88K 2.4M 30M 69M

Vertex-centric interface allows… • Connected Components • Random Walks with Restart • Stochastic Gradient Descent • Message Passing Algorithms • Or, any other vertex-centric algorithm …. right within the SQL database system!

Summary • Graph analytics can be mapped to relational queries (plus UDFs) • SQL systems can offer very good performance over relational queries • We can extend SQL systems to provide more graph-natural query interfaces

Team Members

Non-faculty Members

Non-faculty, non-postdoc …

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker - PowerPoint PPT Presentation

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT + = Jena FlockDB AllegeroGraph TAO Pegasus Neo4j DEX HypergraphDB Pregel Titan GraphBase Twister Giraph Trinity HaLoop GraphLab PrItr SQL ? ? Can we

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

K-Anonymity & Social Networks CompSci 590.03 Instructor:

Real World Search Problems CS 331: Artificial Intelligence Uninformed Search 1 2 Assumptions

BACKEND Clien t Serve r MY BLOG API DATABASE This is my first post. ADD POST Reac t Nod e

Limit Laws for Random Spatial Graphical Models Anima Anandkumar 1 Joseph Yukich 2 Alan Willsky 1 1

Nationwide Collaborative Effort nectar cloud Nationwide Collaborative Effort The NeCTAR Research

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Jie Tang

Outline Introduction 1 Algorithms 2 crtrees 3 Examples 4 Simulations 5 2 / 52

Week 4 - Friday What did we talk about last time? Linked lists You are given a

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker - PowerPoint PPT Presentation

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT + = Jena FlockDB AllegeroGraph TAO Pegasus Neo4j DEX HypergraphDB Pregel Titan GraphBase Twister Giraph Trinity HaLoop GraphLab PrItr SQL ? ? Can we

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

K-Anonymity &amp; Social Networks CompSci 590.03 Instructor:

Real World Search Problems CS 331: Artificial Intelligence Uninformed Search 1 2 Assumptions

BACKEND Clien t Serve r MY BLOG API DATABASE This is my first post. ADD POST Reac t Nod e

Limit Laws for Random Spatial Graphical Models Anima Anandkumar 1 Joseph Yukich 2 Alan Willsky 1 1

Nationwide Collaborative Effort nectar cloud Nationwide Collaborative Effort The NeCTAR Research

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Jie Tang

Outline Introduction 1 Algorithms 2 crtrees 3 Examples 4 Simulations 5 2 / 52

Week 4 - Friday What did we talk about last time? Linked lists You are given a

K-Anonymity & Social Networks CompSci 590.03 Instructor: