Graph Databases Marco Serafini COMPSCI 532 Lecture 10 Graph DB - PowerPoint PPT Presentation

Graph Databases Marco Serafini COMPSCI 532 Lecture 10

Graph DB Use cases • Social network queries • E.g. Facebook stores the entire metadata in a social graph • Network security • Find sequence of steps that lead to intrusion • Fraud detection • Find fraud rings • Knowledge bases • Answer questions, language models 2

Resource Description Framework • World Wide Web Consortium specification • Used for the Semantic Web • Web pages define human-readable content • Goal: add machine-readable meta-data describing how pages relate • Format to reuse and share data across the Web • Examples • Wikipedia, census, life sciences, DBPedia • Directed labeled multi-graph 3 3

RDF Format • Graph is set of triplets = (Subject, Predicate, Object) • Subject and predicate are resources • Associated with Unique Resource Identifiers (URI) • Object can be resource or literal (string) From S. Decker et al., “Framework for the Semantic Web: An RDF Tutorial” 4 4

Query Language: SPARQL • Declarative • Defines a query graph • RDF store must find all instances in data graph • Example • “Return friends of user alice01 who live in Paris” PREFIX sn: http://socialnetwork.com/ontology/ SELECT ?friend WHERE { ?user sn:hasName “alice01”; sn:isFriendOf ?friend. ?friend sn:livesIn sn:Paris. } 5 5

Property Graph Format • Vertices and edges can have associated properties • Key-value pairs •Vertices can be grouped by label • Similar to tables, e.g., employees • Properties are similar to columns of a table • Not a “global” format: no URIs required •Typical more compact than RDFs • Common is NoSQL graph databases 6 6

Query Languages • Cypher • Originally used by Neo4j • Linear queries • Previous example in Cypher MATCH (u:User)-[:isFriend]->(f:User)–[:livesIn]->(:City {name: ‘Paris’}) WHERE (u.name = ‘Alice’) RETURN f.name 7 7

Relational Representation of Graphs • Graphs is a relational DBMS • Vertex table, edge table • Sometimes edges as triplets • Pattern matching • Maintain a set of partial matches • Extend by edge: self-join on edge table 8 8

Why are Graph Workloads Hard? • Many joins: difficult to estimate cardinality • Joins require random access • Cardinality estimation gets harder at every join • Skew: few vertices have very high degree • Indexing •Adjacency list scans are very frequent • Graph-aware databases optimize these • Some queries have very low selectivity • E.g. triangle closure (potential friends) 9 9

Worst-Case Optimal Joins • Worst-Case Optimality • O(intermediate results) <= O(final results) • Edge-at-a-time approach is not worst-case optimal • Number of triangles: O(|E| 3/2 ) • Number of wedges: O(|E| 2 ) • Vertex-at-a-time (multi-way-joins) are WCO • ( v 1 , v 2 ), ( v 1 , v 2 , v 3 ), ( v 1 , v 2 , v 3 , v 4 ), … • Will not materialize all wedges 10 10

Subgraph Isomorphism (TurboISO) SubTask 1 SubTask 2 Match spanning tree Match cross-edges from one starting vertex single starting vertex v v 10 10 10 10 multiple lightweight heavyweight matching 10 4 subgraphs * 220 edge lookups vertices 2 edge lookups 10*10 10*10 100 100 11 11

TurboISO: Flexible Join Order 12 12

Hard to Parallelize Running time (ms) 13 13

Subgraph Enumeration • Count all instances of an unlabeled pattern • E.g. triangles, squares, cliques • Important to rule out permutations 14 14

Reachability Queries • Given two vertices v and u • Find (and/or rank) paths connecting them • Simplest approach: parallel BFS from both vertices • Expensive 15 15

Dynamic Graphs • Temporal Analysis à Deal with multiple snapshots • Real-Time analytics à Work on live graph data • Storage implications ANALYTICAL TRANSACTIONAL SYSTEM SYSTEM LOAD UPDATES RESULTS DYNAMIC READ-ONLY DATA STRUCTURE DATA STRUCTURE + TRANSACTIONS NO TRANSACTIONS E.g.: B-Tree, LSMT E.g.: CSR 16 16

Graph Storage for RT Analytics • Sequential adjacency list scan is important • CSR: Sequential scan but read-only • TEL: LOG-based adjacency list µ s/vertex (seeks) cache miss/edge ns/edge (scan) TEL B+Tree TEL B+Tree TEL B+Tree 1000 10 LSMT Linked List LSMT Linked List LSMT Linked List 100 1 100 10 0.1 1 10 0.01 0.1 2 20 2 21 2 22 2 23 2 24 2 25 2 26 2 20 2 21 2 22 2 23 2 24 2 25 2 26 2 20 2 21 2 22 2 23 2 24 2 25 2 26 graph scale, V graph scale, V graph scale, V Cache misses Seek time Edge scan 17 17

Open Issues • Graph analytics algorithms are diverse • Still looking for good APIs • There is no “SQL for graphs” • Hard to leverage hardware characteristics • Scale out to distributed systems: Hard because of edge cut • SIMD: hard because of skew and random access • Caching: hard because of random access 18 18

Graph Databases Marco Serafini COMPSCI 532 Lecture 10 Graph DB - PowerPoint PPT Presentation

Graph Databases Marco Serafini COMPSCI 532 Lecture 10 Graph DB Use cases Social network queries E.g. Facebook stores the entire metadata in a social graph Network security Find sequence of steps that lead to intrusion

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Databases Picture by Jeremy Hiebert [http://www.flickr.com/photos/jeremyhiebert/] Graph Databases

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Databases Picture by Jeremy Hiebert [http://www.flickr.com/photos/jeremyhiebert/] Graph Databases

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Differentiated access control Differentiated access control to graph data to graph data

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Databases and PHP Accessing databases from PHP PHP & Databases l PHP can connect to

3. Text and document databases Normal databases: formatted records; document databases:

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

DRGR Review Q & A February 11, 2014 2:00 PM EDT Community Planning and Development

Large Scale Integration John Davies Wednesday, 9 March 2011 1 Agenda Problem? What

INF5110 Compiler Construction Spring 2016 1 / 98 Outline 1. Intermediate code generation

PV178: Programming for .NET Framework Introduction to .NET and C# Vojt ech Forejt,

Preparation of the experimental data before evaluation using online tools Viktor Zerkin

CMS IO Overview Brian Bockelman Scalable IO Workshop Topics I Want to Cover Goal for today is

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Advanced Techniques for Building Container Images Adrian Mouat @adrianmouat

Graph Databases Marco Serafini COMPSCI 532 Lecture 10 Graph DB - PowerPoint PPT Presentation

Graph Databases Marco Serafini COMPSCI 532 Lecture 10 Graph DB Use cases Social network queries E.g. Facebook stores the entire metadata in a social graph Network security Find sequence of steps that lead to intrusion

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Databases Picture by Jeremy Hiebert [http://www.flickr.com/photos/jeremyhiebert/] Graph Databases

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Databases Picture by Jeremy Hiebert [http://www.flickr.com/photos/jeremyhiebert/] Graph Databases

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Differentiated access control Differentiated access control to graph data to graph data

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Databases and PHP Accessing databases from PHP PHP &amp; Databases l PHP can connect to

3. Text and document databases Normal databases: formatted records; document databases:

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

DRGR Review Q &amp; A February 11, 2014 2:00 PM EDT Community Planning and Development

Large Scale Integration John Davies Wednesday, 9 March 2011 1 Agenda Problem? What

INF5110 Compiler Construction Spring 2016 1 / 98 Outline 1. Intermediate code generation

PV178: Programming for .NET Framework Introduction to .NET and C# Vojt ech Forejt,

Preparation of the experimental data before evaluation using online tools Viktor Zerkin

CMS IO Overview Brian Bockelman Scalable IO Workshop Topics I Want to Cover Goal for today is

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Advanced Techniques for Building Container Images Adrian Mouat @adrianmouat

Databases and PHP Accessing databases from PHP PHP & Databases l PHP can connect to

DRGR Review Q & A February 11, 2014 2:00 PM EDT Community Planning and Development