MIT And they are growing bigger X X -> DATA DATABASE - PowerPoint PPT Presentation

Graphs On Databases at Talking on NEDB 2014 Alekh Jindal collaborate Supervisors Amol Deshpande Sam Madden work sabbatical Mike Stonebraker work University work of Maryland MIT

And they are growing bigger

X X -> DATA DATABASE

DATABASE Relational -> DATA Relational

DATABASE Streaming -> DATA Streaming

XML XML -> DATA DATABASE

RDF RDF -> DATA DATABASE

Graph Graph -> DATA DATABASE

DATABASE DATA

APPLICATIONS Logical Data Independence DATABASE Physical Data Independence DATA

Barriers to “Graphs on Databases” • Graphs in relational model • Graph operations in SQL • Expressing iterative graph queries • Efficient graph analytics performance • Ease-of-use

Graphs in Relational Model id value Nodes 1 1 4 2 1 5 3 1 4 1 2 5 1 1 fromId toId weight 3 Edges t 1 2 1 2 4 1 2 5 1 3 2 1 5 1 1

Graphs Operations in SQL • Node access   4 Select * From Nodes Where Id=ID 5 • Neighborhood access   Select * From Edges Where fromId=ID 2 1 • Parallel neighborhood access   3 Select * From Edges Group By fromId • 1-hop neighbors   Select * From Edges e1,Edges e2 Where e1.toId=e2.fromId

Example: Shortest Paths UPDATE Nodes AS node SET value= new_node.value FROM( SELECT e.toId AS Id, min(n1.value+1) AS value FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId GROUP BY e.toId, n2.value HAVING min(n1.value+1) < n2.value ) AS new_node WHERE node.Id = new_node.Id;

Example: Shortest Paths UPDATE Nodes AS node SET value= new_node.value FROM( Nested Query SELECT e.toId AS Id, min(n1.value+1) AS value FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId Parallel GROUP BY e.toId, n2.value Graph Exploration HAVING min(n1.value+1) < n2.value ) AS new_node WHERE node.Id = new_node.Id; Sorting/Indexing

Iterative Graph Queries • Driver program:   4 UDF / Stored Procedure 5 • Three Things:   2 - initialization   1 - actual graph query (in a loop)   3 - termination condition

Example: Shortest Paths Initialization: 1. Set the value of start node to 0 2. Set the value of all other node to inf Loop: UPDATE Nodes AS node SET value=new_node.value FROM( SELECT e.toId AS Id, min(n1.value+1) AS value The shortest paths SQL FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId GROUP BY e.toId, n2.value HAVING min(n1.value+1) < n2.value ) AS new_node WHERE node.Id = new_node.Id; Termination Condition: No more nodes to Update

Efficient Graph Analytics • Three SQL Databases:   - row store   - column store   - main-memory store • Two Graph Databases:   - transactional graph database   - graph analytics system • Two queries: PageRank, Shortest Paths • Social network dataset from snap.stanford.edu/data

PageRank 10000 Graph Database 4,172.4 Main-memory Database Row Store Database Apache Giraph Column Store Database 1000 589.0 Time (seconds) 218.1 101.5 100 53.5 47.0 29.4 28.0 17.4 10 4.2 3.3 1 Twitter GPlus LiveJournal

Shortest Paths 100000 Graph Database Main-memory Database 18,702.2 Row Store Database Apache Giraph 10000 Column Store Database Time (seconds) 1000 492.9 395.6 135.1 115.5 100 74.5 50.8 43.7 29.1 9.2 10 6.7 1 Twitter GPlus LiveJournal

Ease-of-Use SQL Pregel void compute(vector<float> messages){ UPDATE Nodes AS node SET value=new_node.value // get the minimum distance FROM( SELECT e.toId AS Id, min(n1.value+1) AS value float mindist = id==START_NODE ? 0 : DBL_MAX; FROM Nodes AS n1, Edges AS e, Nodes AS n2 for(vector<float>::iterator it = messages.begin(); WHERE n1.Id=e.fromId AND n2.Id=e.toId it != messages.end(); ++it) GROUP BY e.toId, n2.value mindist = min(mindist,*it); HAVING min(n1.value+1) < n2.value // send messages to all edges if new minimum is found ) AS new_node float vvalue = getVertexValue(); WHERE node.Id = new_node.Id; if(mindist < vvalue){ modifyVertexValue(mindist); vector<int> edges = getOutEdges(); for(vector<int>::iterator it = edges.begin(); it != edges.end(); ++it) sendMessage(*it, mindist+1); } // halt voteToHalt(); }

Ease-of-Use SQL Pregel 4 Nodes Edges 5 fromId toId weight id value 1 2 1 1 1 2 1 2 4 1 2 2 5 1 3 1 3 2 1 4 1 1 5 1 5 1 1 3

Vertex-centric Query Interface Vertex Programs APPLICATION Pregel-style API: - getMessages()   - getEdges()   - sendMessages() - voteToHalt(), etc. Logical Data Independence Vertex UDF Invokes the vertex program if:   - the vertex is active, or DATABASE - the vertex has incoming messages Coordinator Synchronizes supersteps Redistributes Messages Physical Data Independence DATA Vertex (V), Edge (E), Message (M)

Vertex-centric Query Interface Vertex Programs APPLICATION Pregel-style API: - getMessages()   - getEdges()   - sendMessages() - voteToHalt(), etc. Logical Data Independence Vertex UDF Invokes the vertex program if:   Batching - the vertex is active, or DATABASE - the vertex has incoming messages Coordinator No in- place Synchronizes supersteps Updates Redistributes Messages Physical Data Independence DATA Vertex (V), Edge (E), Message (M) Union

PageRank (Vertex) 100000 Main-memory Database Apache Giraph Column Store Database 10000 2,071.0 Time (seconds) 1000 421.5 335.5 218.1 100 53.5 47.7 47.0 10.9 10 1 Twitter GPlus LiveJournal

Shortest Paths (Vertex) 10000 7,950.1 Main-memory Database Apache Giraph Column Store Database 1000 712.2 Time (seconds) 146.3 121.0 115.5 100 50.8 43.7 23.8 10.6 10 1 Twitter GPlus LiveJournal

Vertex-centric interface allows… • Connected Components • Random Walks with Restart • Stochastic Gradient Descent • Or, other message Passing Algorithms …. right within the database system!

Advantages of “Graphs on Databases” • Running arbitrary SQL queries • Pre- and post- processing of data • Updates are trivial • ACID for free • Don’t need to deal with Yet-Another-System!

Summary • Graph analytics can be mapped to relational queries (plus UDFs) • SQL systems can offer very good performance over relational queries • We can extend SQL systems to provide more graph-natural query interfaces

MIT And they are growing bigger X X -> DATA DATABASE - PowerPoint PPT Presentation

Graphs On Databases at Talking on NEDB 2014 Alekh Jindal collaborate Supervisors Amol Deshpande Sam Madden work sabbatical Mike Stonebraker work University work of Maryland MIT And they are growing bigger X X -> DATA

MIT MIT S EMINAR ON S EMINAR ON MIT ESD.69 EMINAR ON EMINAR ON MIT HST.926 H EALTH EALTH C ARE

More, bigger, better and joined More, bigger, better and joined HNV: The pros: Recognising

Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by

Where Bigger Is Where Bigger Is Jan 2016 Jan 2016 Cautionary Statement Cautionary Statement

Human Error - The Weakest link in CyberSecurity Exceptional IT. Real People. Bigger Purpose.

Precision growing: Faster, Bigger, Better! Bringing new concepts to plant growing 1 Bios Jack

Justin Solomon Sebastian Claici MIT MIT Justin Solomon Sebastian Claici MIT MIT Client

Foundations of Cryptography MIT-6.875/18.425 , UCB CS-276 Lecture 1 Shafi Goldwasser MIT,

Professional Issues Professional Issues produce bigger and better idiots. So far, the produce

CMV A Bigger Threat Much Closer to Home Sharon Wood, Project Manager, CMV Action MMB

CEO Board presentation Seth Berkley MD 2 nd December 2015 Geneva Board meeting 2-3 December

goedemiddag Overview PDAM Pontianak & TEC JANUAR BAMBANG LAJITO RULI NGADEMIN 48 x

PK/PD Study Strategies for PK/PD Study Strategies for Biopharmaceuticals: Is Bigger Better?

Growing for Success WELCOME SPONSORS PLATINUM GOLD GOLD SILVER BRONZE Growing For Success

GIS for Architecture MIT GIS Services Libguides.mit.edu/gis gishelp@mit.edu What is GIS and

KARAJAN / MIT CLASSICAL MUSIC HACKATHON October 21-23 @ MIT Music Library cuthbert@mit.edu

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

SQL-PL4OCL: an automatic code generator from OCL to SQL procedural language Marina Egea and

Database Application Development 340151 Big Databases & Cloud Services (P. Baumann) 1 SQL

SQL vs PostgreSQL 1 Checks in PostgreSQL Tuple-based checks may only refer to attributes of

Database Application Development Oracle PL/SQL CS430/630 Lecture 15 Slides based on Database

CS 251 Fall 2019 CS 240 Spring 2020 Principles of Programming Languages Foundations of

Database Systems I SQL Constraints and Triggers Ensuring business rules are met. I NTEGRITY C

1 Execution Implementation Overview Master notifies reducers about Focuses on large

MIT And they are growing bigger X X -> DATA DATABASE - PowerPoint PPT Presentation

Graphs On Databases at Talking on NEDB 2014 Alekh Jindal collaborate Supervisors Amol Deshpande Sam Madden work sabbatical Mike Stonebraker work University work of Maryland MIT And they are growing bigger X X -> DATA

MIT MIT S EMINAR ON S EMINAR ON MIT ESD.69 EMINAR ON EMINAR ON MIT HST.926 H EALTH EALTH C ARE

More, bigger, better and joined More, bigger, better and joined HNV: The pros: Recognising

Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by

Where Bigger Is Where Bigger Is Jan 2016 Jan 2016 Cautionary Statement Cautionary Statement

Human Error - The Weakest link in CyberSecurity Exceptional IT. Real People. Bigger Purpose.

Precision growing: Faster, Bigger, Better! Bringing new concepts to plant growing 1 Bios Jack

Justin Solomon Sebastian Claici MIT MIT Justin Solomon Sebastian Claici MIT MIT Client

Foundations of Cryptography MIT-6.875/18.425 , UCB CS-276 Lecture 1 Shafi Goldwasser MIT,

Professional Issues Professional Issues produce bigger and better idiots. So far, the produce

CMV A Bigger Threat Much Closer to Home Sharon Wood, Project Manager, CMV Action MMB

CEO Board presentation Seth Berkley MD 2 nd December 2015 Geneva Board meeting 2-3 December

goedemiddag Overview PDAM Pontianak &amp; TEC JANUAR BAMBANG LAJITO RULI NGADEMIN 48 x

PK/PD Study Strategies for PK/PD Study Strategies for Biopharmaceuticals: Is Bigger Better?

Growing for Success WELCOME SPONSORS PLATINUM GOLD GOLD SILVER BRONZE Growing For Success

GIS for Architecture MIT GIS Services Libguides.mit.edu/gis gishelp@mit.edu What is GIS and

KARAJAN / MIT CLASSICAL MUSIC HACKATHON October 21-23 @ MIT Music Library cuthbert@mit.edu

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

SQL-PL4OCL: an automatic code generator from OCL to SQL procedural language Marina Egea and

Database Application Development 340151 Big Databases &amp; Cloud Services (P. Baumann) 1 SQL

SQL vs PostgreSQL 1 Checks in PostgreSQL Tuple-based checks may only refer to attributes of

Database Application Development Oracle PL/SQL CS430/630 Lecture 15 Slides based on Database

CS 251 Fall 2019 CS 240 Spring 2020 Principles of Programming Languages Foundations of

Database Systems I SQL Constraints and Triggers Ensuring business rules are met. I NTEGRITY C

1 Execution Implementation Overview Master notifies reducers about Focuses on large

goedemiddag Overview PDAM Pontianak & TEC JANUAR BAMBANG LAJITO RULI NGADEMIN 48 x

Database Application Development 340151 Big Databases & Cloud Services (P. Baumann) 1 SQL