Beyond Macrobenchmarks Microbenchmark-based Graph Database Evaluation Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis Universiteit Utrecth
Knowledge Graph Protein Interaction Road Network Network Graphs are Everywhere Social Network 2 Graph Databases Evaluation – Matteo Lissandrini
PROPERTY node01 GRAPHS 2 0 e g d rences e Na Name: Matteo refere Ro Role: Post-do doc edge01 re In Interests: Graphs Presents Pr node02 On : 2019-08 On Title: Beyond Ti ond…. 08-26 26 Top Topic: Gr GraphDB edge03 in in Edge-labelled Multigraphs node03 G : ⟨ V, E, L, ℓ⟩ name: VLDB’19 na ye year ar: 2019 ID : V / E ↦ ℕ Labeling ℓ : E ↦ L Properties: V/E ↦ { <key,value>, …} 3 Graph Databases Evaluation – Matteo Lissandrini
GRAPH DATABASES Oracle Graph CosmosDB Neptune 4 Graph Databases Evaluation – Matteo Lissandrini
WHERE TO OLAP * STORE A Business-intelligence Batch Algorithms Graph GRAPH? Processing ArangoDB Processing Statistics Blazegraph Mining [Ammar and Özsu, VLDB’18] Neo4j OrientDB Complex Queries Pathfinding Sparksee Connectivity Titan/Janus Export/Import OLTP GraphLab Updates Giraph/Pregel Graph Transaction GraphX Databases Selectivity Indices User-interaction Our Focus Concurrency Availability 5 Graph Databases Evaluation – Matteo Lissandrini
HOW TO CHOOSE ArangoDB THE RIGHT SYSTEM? Blazegraph Neo4j Complex Queries OrientDB Pathfinding Sparksee Connectivity Titan/Janus Export/Import What solution ? works best? OLTP Updates Transaction Selectivity Indices Graph User-interaction Concurrency Databases Availability 6 Graph Databases Evaluation – Matteo Lissandrini
THERE IS NO SILVER BULLET Different Data Characteristics Different Query Types Different Use-cases Different Data Organization Different Indexing/Optimizations Different Query Processing Strategies 7 Graph Databases Evaluation – Matteo Lissandrini
GRAPH DATABASE ARCHITECTURES Specialized Native Query-processing &Algorithms Query Processing How to implement a Graph Database Specialized Data-structures Non Storage Native & Indexes Native 8 Graph Databases Evaluation – Matteo Lissandrini
GOAL: UNDERSTAND 9 GRAPH DATABASES PERFORMANCE FACTORS 1 System Architecture Query Workload Data Characteristics OUTCOME 2 Evaluate Pros/Cons of each design decision Identify cause of underperformant operations 9 Graph Databases Evaluation – Matteo Lissandrini
Macro-Benchmarks Micro-Benchmark Our Proposal Example: Goals Goals • Predefined realistic(?) Domain & Application • Applicable over different Domains/Datasets • Study specific Use-Cases • Test Basic & Common Operations Techniques Techniques • Test Complex Operations • Decompose Complex Queries • Queries based on the structure of the data • Identify Ubiquitous Operators and output of previous queries • Test Same Operations under Different Conditions Limitations Advantages • Test query-planner but hides single operator • Domain/Data Independent performance • Generalizable • Domain Specific • Allow identification of Weak Operators 10 Graph Databases Evaluation – Matteo Lissandrini
MICRO-BENCHMARKING GRAPH OPERATIONS CRUD: Create Read Graph Queries: Update Delete Edges & Traversals Insertions, updates, retrievals both Access local structure around the for values stored on nodes and edges , node, verify reachability , as well as and structural elements search for nodes with specific (add/remove/retrieve nodes/edges) structural characteristics 11 Graph Databases Evaluation – Matteo Lissandrini
MICRO-BENCHMARKING GRAPH OPERATIONS CRUD: Create Read Update Delete Insertions, updates, retrievals both for values stored on nodes and edges , and structural elements (add/remove/retrieve nodes/edges) • • Create new node with property P { Name : Value } Find node/edge with specific ID • • Add edge from v 1 to v 2 (plus some properties P ) Find nodes/edges with property P { Name : Value } • • Add property P { Name : Value } to node v or to edge e Find edges with a specific label • Add a new node , and then edges from it to other nodes • • Update Value for property P { Name : Value } Count edges/nodes • • Delete Node/Edge Count distinct edge labels • Delete node property P from node/edge 12 Graph Databases Evaluation – Matteo Lissandrini
MICRO-BENCHMARKING GRAPH OPERATIONS Graph Queries: Edges & Traversals Access local structure around the node, verify reachability , as well as search for nodes with specific structural characteristics • Find nodes directly connected (find all • Find all nodes reachable in K or less steps ( BFS ) incoming/outgoing edges) • Find a list of shortest paths between two nodes • Find only certain connections ( filter by label ) • Degree based search : e.g., high degree nodes, only inbound connections 13 Graph Databases Evaluation – Matteo Lissandrini
# Query 1. g.loadGraphSON("/path") Description OUR FRAMEWORK 2. g.addVertex(p[]) Load dataset into the graph ‘g’ 3. g.addEdge(v1 , v2 , l) Cat Create new node with properties p 4. g.addEdge(v1 , v2 , l , p[]) Add edge � from � 1 to � 2 Selected Operations 5. L v.setProperty(Name, Value) Same as Q.3 , but with properties p 6. e.setProperty(Name, Value) Add property Name = Value to node � 7. g.addVertex(. . . ); g.addEdge(. . . ) Add a new node, and then edges to it Add property Name = Value to edge e 8. g.V.count() C 9. g.E.count() Total number of nodes 10. g.E.label.dedup() Total number of edges 11. g.V.has(Name, Value) Existing edge labels (no duplicates) 12. g.E.has(Name, Value) Nodes with property Name = Value 13. g.E.has(’label’,l) Edges with property Name = Value 14. g.V(id) • Coverage of all the required operations Edges with label l 15. g.E(id) R The node with identifier � d 16. v.setProperty(Name, Value) The edge with identifier � d 17. e.setProperty(Name, Value) • Complex queries can be composed through those Update property Name for vertex � 18. g.removeVertex(id) Update property Name for edge e 19. g.removeEdge(id) Delete node identified by � d 20. v.removeProperty(Name) U • Domain agnostic Delete edge identified by � d 21. e.removeProperty(Name) Remove node property Name from � 22. v.in() Remove edge property Name from e 23. v.out() D Nodes adjacent to � via incoming edges 24. v.both(‘l’) Nodes adjacent to � via outgoing edges 25. v.inE.label.dedup() Nodes adjacent to � via edges labeled l 26. v.outE.label.dedup() Labels of in coming edges of � (no dupl.) 27. v.bothE.label.dedup() Labels of outgoing edges of � (no dupl.) 28. g.V.filter{it.inE.count()>=k} 3 5 d i s t i n c Labels of edges of � (no dupl.) t 29. g.V.filter{it.outE.count()>=k} Nodes of at least k-incoming-degree 30. g.V.filter{it.bothE.count()>=k} C o n c r e Nodes of at least k-outgoing-degree t e 31. g.V.out.dedup() Nodes of at least k-degree 32. v.as(‘i’).both().except(vs) O p Nodes having an incoming edge e r a t o r s .store(j).loop(‘i’) T Nodes reached via breadth-First 33. v.as(‘i’).both(*ls).except(j) .store(vs).loop(‘i’) traversal from � Nodes reached via breadth-First 34. v1.as(’i’).both().except(j).store(j) Unweighted Shortest Path from � 1 to � 2 .loop(’i’){!it.object.equals(v2)} traversal from � on labels � s .retain([v2]).path() 35. Shortest Path on ‘l’ ∗ [ ] d e n o t e s a H a s h M a p Same as Q.34 , but only following label � ; g i s t h e g r a p h ; � a n d e a r e n o d e / e d g e s . 14 Graph Databases Evaluation – Matteo Lissandrini
B a t t e r OUR FRAMEWORK i e s I n c l u d e d Experimental Environment Various Sizes & Domains: Real and Synthetic Datasets Ready-to-go Systems & Configurations Connected Component Degree Most popular systems | V | | E | | L | # Maxim Density Modularity Avg Max � 1 . 34 ∗ 10 − 3 3 . 66 ∗ 10 − 2 already integrated and Yeast 2 . 3 K 7 . 1 K 167 101 2 . 2 K 6 . 1 66 11 1 . 10 ∗ 10 − 6 5 . 45 ∗ 10 − 3 21 . 6 1 . 3 K 23 MiCo 100 K 1 . 1 M 106 1 . 3 K 93 K ready to use 1 . 19 ∗ 10 − 6 9 . 82 ∗ 10 − 1 1 . 9 M 4 . 3 M 424 133 K 1 . 6 M 4 . 3 92 K 48 Frb-O 1 . 20 ∗ 10 − 6 9 . 91 ∗ 10 − 1 Frb-S 0 . 5 M 0 . 3 M 1814 0 . 16 M 20 K 1 . 3 13 K 4 1 . 94 ∗ 10 − 7 7 . 97 ∗ 10 − 1 Frb-M 4 M 3 . 1 M 2912 1 . 1 M 1 . 4 M 1 . 5 139 K 37 3 . 87 ∗ 10 − 8 2 . 12 ∗ 10 − 1 Frb-L 28 . 4 M 31 . 2 M 3821 2 M 23 M 2 . 2 1 . 4 M 33 4 . 43 ∗ 10 − 5 ldbc 184 K 1 . 5 M 15 1 184 K 0 16 . 6 48 K 10 PREVIOUS TESTS ONLY 1M Nodes 15 Graph Databases Evaluation – Matteo Lissandrini
OUR FRAMEWORK Extensibility Reproducible! Common Query Language Easy to add • New Queries • New Systems • New Datasets Plug and Play setup & Controlled Environment 16 Graph Databases Evaluation – Matteo Lissandrini
Recommend
More recommend