Towards a property graph generator for benchmarking Arnau Prat-Prez - PowerPoint PPT Presentation

Towards a property graph generator for benchmarking Arnau Prat-Pérez Davide Basilio Bartolini Joan Guisado-Gámez Siegfried Depner Xavier Fernández-Salas Petr Koupy

Why a property graph generator? Graph-based analysis is becoming more and more popular ● GraphMAT TOTE TOTEM

Why a property graph generator? For the field to advance, many benchmarking initiatives have ● appeared gMark Graphalytics Social Network Benchmark LinkBench LUBM

Why a property graph generator? Benchmarks need datasets, preferably real ones ●

Why a property graph generator? But ... ●

Why a property graph generator? But ... ● OR

Why a property graph generator? Synthetic graph generators ● However, each benchmark has specific data needs ● each benchmark designer implements its own – time consuming task sometimes reinventing the wheel –

Why a property graph generator? Tool that, given some “graph specification”, produces a synthetic ● graph with the specified characteristics DataSynth ● https://github.com/DAMA-UPC/DataSynth – Written in Scala – Uses Apache Spark –

Architecture Overview Scala based DSL with Frontend extensive use of code generation DSL Parser Execution Plan Optimizer Optimizations possible for certain types of graphs Backend State of the art BigData Apache Spark Runtime framework

What features should DataSynth have? But what characteristics should a property graph generator be able to ● reproduce?

What features should DataSynth have? But what characteristics should a property graph generator be able to ● reproduce? Properties and correlations/dependencies between them - e.g. name is correlated with country

What features should DataSynth have? But what characteristics should a property graph generator be able to ● reproduce? Properties and correlations/dependencies between them - e.g. name is correlated with country Variate Structure - degree distributions - community structure - low diameter - large connected component - etc.

What features should DataSynth have? But what characteristics should a property graph generator be able to ● reproduce? Properties and correlations/dependencies between them - e.g. name is correlated with country Variate Structure Property-Structure - degree distributions correlations/depencies - community structure - e.g. Chinese people tend to - low diameter connect to Chinese people - large connected component - represented as a P(X,Y) of - etc. observing X and Y on a randomly picked edge.

What features should DataSynth have? But what characteristics should a property graph generator be able to ● reproduce? Properties and correlations/dependencies between them S - e.g. name is correlated with country C A L Variate Structure E Property-Structure - degree distributions correlations/depencies - community structure - e.g. Chinese people tend to - low diameter connect to Chinese people - large connected component - represented as a P(X,Y) of - etc. observing X and Y on a randomly picked edge.

But... Having a single algorithm for generating ● so many things seems too complex Properties and property correlations – Ralistic graph structure – Property-structure correlations – There are tens of metrics to measure ● the structure of a graph, which ones to take (which possibly depend on the algorithms used)?

Person DataSynth's approach Country Name knows date TIME

Person DataSynth's approach Country Name knows node property generation date Id Name Id Country 1 Lee 1 China 2 Japan 2 Hiroshi 3 China 3 Yang ... ... ... ... 17 Germany 17 Wolfgang structure generation TIME

Person DataSynth's approach Country Name knows node property generation Matching preserving given joint date Id Name Id Country probability distributions 1 Lee 1 China 2 Japan 2 Hiroshi 3 China 3 Yang 11 7 ... ... ... ... 10 16 5 17 Germany 17 Wolfgang 2 15 1 3 14 8 structure generation 9 12 4 6 17 13 e.g. P(China,China) ≈ 0.2 TIME

Person DataSynth's approach Country Name knows node property generation Matching preserving given joint date Id Name Id Country probability distributions 1 Lee 1 China 2 Japan 2 Hiroshi edge property generation 3 China 3 Yang 11 7 ... ... ... ... 10 16 Id date 5 17 Germany 17 Wolfgang 2 15 1 30/01/2015 1 3 14 2 4/06/2016 8 structure generation 9 3 12/11/2016 12 4 6 ... ... 17 30 03/03/2017 13 e.g. P(China,China) ≈ 0.2 TIME

DataSynt's Approach Pros: ● Accurate distributions of property values and correlations between properties – Does not limit us to a single way of generating the structure of a graph – We can use existing techniques and let the door open to new contributions ● Pay for what we get – Cons: ● Heavy relies on a sophisticated matching approach to achieve accurate property- – structure correlation

Property Generation We have a “Property Table” for each <type,property> pair ● We use a similar technique to that proposed by Myriad [1] ● Highly parallel – Allows in-place data generation – Given and Id of an entity, I can generate its properties ● [1] Alexander Alexandrov, Kostas Tzoumas, and Volker Markl. 2012. Myriad: scalable and expressive data generation. PVLDB 5, 12 (2012), 1890–1893.

Structure Generation We can use existing scalable graph generation techniques: BTER [1], ● Darwini [2], etc. Hadoop implementation of BTER implemented: ● https://github.com/DAMA-UPC/BTERonH – [1] Tamara G Kolda et al. 2014. A scalable generative graph model with community structure. SISC 36, 5 (2014), C424–C452. [2] Sergey Edunov et al. 2016. Darwini: Generating realistic large-scale social graphs. arXiv:1610.00664 (2016)

Property-to-Structure Matching Input P(X,Y) 0.3 0.067 0.067 0.067 0.33 0.067 0.067 0.067 0.17

Property-to-Structure Matching Input Block Model 6 9 2 2 7 2 10 2 4 2 2 5 P(X,Y) 6,9 2 2 0.3 0.067 0.067 7,10 0.067 0.33 0.067 4,5 0.067 0.067 0.17 2

Property-to-Structure Matching Input Block Model 6 9 2 2 7 2 10 2 4 2 2 5 P(X,Y) 6,9 2 2 0.3 0.067 0.067 7,10 0.067 0.33 0.067 4,5 0.067 0.067 0.17 2 Graph Partitioning

Next Steps Investigate further on the performance/quality of our Matching approach ● Multithreaded/Distributed – Efficient for high-cardinality values – Understand when and when not works well – Push for the DSL ● Integrate more existing structure generators ● bi-partite graphs – Long term: work towards “DGaaS” (Data Generation as a Service) ●

Towards a property graph generator for benchmarking Arnau Prat-Prez - PowerPoint PPT Presentation

Towards a property graph generator for benchmarking Arnau Prat-Prez Davide Basilio Bartolini Joan Guisado-Gmez Siegfried Depner Xavier Fernndez-Salas Petr Koupy Why a property graph generator? Graph-based analysis is becoming more and

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

Build your own VTA design with Chisel Luis Vega VTA-generator vision VTA-generator vision

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

What is the cloud? Property of TalentWise Property of TalentWise Cloud HCM Players Property of

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Emergency Generator Power Super Storm Sandy Review Purchasing a Generator

Our Core Business Summary 1. Diesel Generator Sets 1.1. Low Voltage Diesel Generator Sets (50

Greenway Organic Rankine Cycle Engine/Generator Project Greenway Organic Rankine Cycle

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Treasurers Institute Sun, Nov. 17, 2019 Property Tax Errors Property Tax Errors Property Tax

Trade Induced Technical Change? Trade Induced Technical Change? The Impact of The Impact of

Self-formation, self-cultivation and social formation in higher education Simon Marginson

to the Netherlands China Low-Frequency Explorer Presented by : Dr Francois Malan Date : 12

HEBREW DATING Nachum Dershowitz Edward M. Reingold CALENDARS Gregorian Julian

Scenario Workshop SOUTHEAST GUIDING COALITION ENROLLMENT AND PROGRAM BALANCING October 8, 2020 -

What is the FDA UDI? Appearing in two forms* : Easily readable plain-text , and Automatic

CSN08101 Digital Forensics Lecture 9: Data Analysis Lecture 9: Data Analysis Module Leader: Dr

Version Control Systems: SVN and GIT How do VCS support SW development teams? CS 435/535 The

Towards a property graph generator for benchmarking Arnau Prat-Prez - PowerPoint PPT Presentation

Towards a property graph generator for benchmarking Arnau Prat-Prez Davide Basilio Bartolini Joan Guisado-Gmez Siegfried Depner Xavier Fernndez-Salas Petr Koupy Why a property graph generator? Graph-based analysis is becoming more and

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

Build your own VTA design with Chisel Luis Vega VTA-generator vision VTA-generator vision

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

What is the cloud? Property of TalentWise Property of TalentWise Cloud HCM Players Property of

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Emergency Generator Power Super Storm Sandy Review Purchasing a Generator

Our Core Business Summary 1. Diesel Generator Sets 1.1. Low Voltage Diesel Generator Sets (50

Greenway Organic Rankine Cycle Engine/Generator Project Greenway Organic Rankine Cycle

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Treasurers Institute Sun, Nov. 17, 2019 Property Tax Errors Property Tax Errors Property Tax

Trade Induced Technical Change? Trade Induced Technical Change? The Impact of The Impact of

Self-formation, self-cultivation and social formation in higher education Simon Marginson

to the Netherlands China Low-Frequency Explorer Presented by : Dr Francois Malan Date : 12

HEBREW DATING Nachum Dershowitz Edward M. Reingold CALENDARS Gregorian Julian

Scenario Workshop SOUTHEAST GUIDING COALITION ENROLLMENT AND PROGRAM BALANCING October 8, 2020 -

What is the FDA UDI? Appearing in two forms* : Easily readable plain-text , and Automatic

CSN08101 Digital Forensics Lecture 9: Data Analysis Lecture 9: Data Analysis Module Leader: Dr

Version Control Systems: SVN and GIT How do VCS support SW development teams? CS 435/535 The

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,