GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic - PowerPoint PPT Presentation

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks Rui Wang † , Yongkun Li † , Hong Xie ‡ , Yinlong Xu † , John C.S. Lui* † University of Science and Technology of China ‡ Chongqing University * The Chinese University of Hong Kong USENIX ATC 2020

Ø Social networks Webpage links Recommendation systems Graph analytics is one of the top 10 data and analytics technology trends [1] [1] Gartner's top 10 data and analytics technology trends for 2019 2

Ø Ø • • • Random walks can realize an approximate calculation on large graphs. [2] The 2012 common crawl graph. http://webdatacommons.org. 3

Ø a) b) c) Ø • Massive walks situations! • • 4

Ø ② update walks in the sub-graph Memory ① load a sub-graph to memory Disk Loop until all walks finished 5

Ø How many walkers in How many steps each loaded sub-graph? each walker can update? 6

Ø • Start 10 6 walks from a source in Friendster (68.3M vertices) on DurnkardMob More than 200000 walks 0.03% Only 12 walks walks distribution #,-./ ./0.- I/O 𝑣𝑢𝑗𝑚𝑗𝑨𝑏𝑢𝑗𝑝𝑜 = #12134 ./0.- 56 3 -,70839: among subgraphs after 4 steps Some loaded blocks contain only few walkers, results in low I/O utilization 7

Ø • 𝑥 > 𝑥 ? block b 0 block b 0 𝑥 < 𝑥 ? 2 2 1 1 𝑥 < 𝑥 = 7 𝑥 > 7 0 0 3 6 9 𝑥 = 3 6 9 block b 1 8 4 8 4 block b 1 block b 2 block b 2 5 5 Let the most walkers get moved by an I/O 8

Ø • Start 10 6 walks of 10 steps from a source in Friendster (68.3M vertices) on DurnkardMob Avg: 0.2% walks in 1 st subgraph walk steps update rate in each I/O after each iteration Many walks remain in the current subgraph after walking one step Synchronized walk updating leads to low walk updating rate 9

Ø Successively update each walk until it moved out of current subgraph Maximize the I/O utilization of a loaded subgraph Straggler problem: probabilistic approach with a probability p, we choose to load the subgraph with the shortest walker 10

Ø • walk arrays bucket array Too many walk arrays 0 0 128 Ø 1 256 • … … |v|-128 • P−1 |v| #walk limited as its hard to flush all walks to disk for too many files 11

Ø • p Dynamic walk arrays bucket array • 0 0 • 128 1 256 Frequent memory re-allocation … … |v|-128 P−1 memory waste |v| Dynamic arrays bring high cost for storing walk data 12

Ø • Block Walk pool Fixed-length array in disk file block array 0 block 0 1 block 1 … … source current step 40 39 14 13 0 63 P block P low s 𝐮𝐩𝐬𝐛𝐡𝐟 𝐝𝐩𝐭𝐮 low I/O cost less memory re-allocation 13

Ø Other optimizations Data conflict Graph block size Light-weighted Walk-conscious optimization in configuration blocking cache strategy multi-threads More details are in the paper Ø Prototype system——GraphWalker 14

Ø • • Ø Largest dataset 15

Ø Optimize the walk management DrunkardMob Ø Fine-grained I/O management Graphene GraFSoft Ø KnightKing Optimize the walk forwarding process 16

Ø Performance of random walks with different number of walks Fix walk length as 10 • GraphWalker achieves 16x-70x speedup. GraphWalker is also capable to support huge graphs and massive walks. GraphWalker finishes running 10 10 walks on the largest dataset CrawlWeb within around one hour. 17

Ø Performance of random walks with different walk lengths Fix the number of walks as 10 5 • GraphWalker achieves even more than three orders of magnitude in the best case. GraphWalker also achieves 7 – 10x speedup for Kron30. 18

Ø I/O utilization and walk updating rate RWD (|V| * 6 walks) in YahooWeb (1.6B vertices) • DrunkardMob needs 150 I/Os and GraphWalker only needs 46 I/Os. GraphWalker achieves 2 – 4x I/O utilization. 19

Ø Compare with single machine systems Graphene and GraFSoft R * 10 • GraphWalker achieves 2 - 40x speedup compared to Graphene. GraphWalker achieves 1 - 37x speedup compared to GraFSoft. 20

Ø Compare with distributed system KnightKing Run |v| walks, each vertex start one walk • Terminate with probability 0.15 in each walk step • GraphWalker (1 node) achieves comparable with KnightKing (8 nodes). 21

Ø • • • Ø https://github.com/ustcadsl/graphwalker Ø 22

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic - PowerPoint PPT Presentation

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks Rui Wang , Yongkun Li , Hong Xie , Yinlong Xu , John C.S. Lui* University of Science and Technology of China

Friendly Communities Sarah Prescott and Jude Woods Time to Shine Leeds Older Peoples Forum

Efficient Graph Rewriting York Semigroup Graham Campbell May 2019 Graham Campbell Efficient

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Home of the Future Home of the Future and and Environmentally-Friendly Environmentally

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Becoming Age-Friendly Joining the World Health Organization and AARP Network of Age-Friendly

Age-Friendly Walking in Small and Rural Towns April 18, 2018 What is Age-Friendly Walking? What

(BFI) 2009 Update What is the Baby Friendly Initiative? The Baby - Friendly Initiative

Bird Friendly Guidelines Development Services Committee October 22, 2013 Bird Friendly

Resource Resource Management Management RESOURCE MANAGEMENT RESOURCE MANAGEMENT We have a

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

Graph Traversal Graph Traversal with DFS/BFS One of the most fundamental graph problems is to

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This

Welcome to INF1343! Database Modeling and Database Design Yuri Takhteyev University of Toronto

Combinatory Categorial Grammar The effort to develop natural language grammars and

Netflix: Integrating Spark At Petabyte Scale Ashwin Shankar Cheolsoo Park Outline 1. Netflix

The Protg Plugin Architecture Timothy Redmond Tania Tudorache, Jennifer Vendetti Overview

WELCOME TO MENS LIFE 2019-2020 Carl Hofmann Teaching Leader Matthew 25:1-13 1 At that

C C B B T T Jan Christopher Vogt / @cvogt https://github.com/cvogt/talk-2016-03-04 NESCALA

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic - PowerPoint PPT Presentation

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks Rui Wang , Yongkun Li , Hong Xie , Yinlong Xu , John C.S. Lui* University of Science and Technology of China

Friendly Communities Sarah Prescott and Jude Woods Time to Shine Leeds Older Peoples Forum

Efficient Graph Rewriting York Semigroup Graham Campbell May 2019 Graham Campbell Efficient

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Home of the Future Home of the Future and and Environmentally-Friendly Environmentally

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Becoming Age-Friendly Joining the World Health Organization and AARP Network of Age-Friendly

Age-Friendly Walking in Small and Rural Towns April 18, 2018 What is Age-Friendly Walking? What

(BFI) 2009 Update What is the Baby Friendly Initiative? The Baby - Friendly Initiative

Bird Friendly Guidelines Development Services Committee October 22, 2013 Bird Friendly

Resource Resource Management Management RESOURCE MANAGEMENT RESOURCE MANAGEMENT We have a

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

Graph Traversal Graph Traversal with DFS/BFS One of the most fundamental graph problems is to

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This

Welcome to INF1343! Database Modeling and Database Design Yuri Takhteyev University of Toronto

Combinatory Categorial Grammar The effort to develop natural language grammars and

Netflix: Integrating Spark At Petabyte Scale Ashwin Shankar Cheolsoo Park Outline 1. Netflix

The Protg Plugin Architecture Timothy Redmond Tania Tudorache, Jennifer Vendetti Overview

WELCOME TO MENS LIFE 2019-2020 Carl Hofmann Teaching Leader Matthew 25:1-13 1 At that

C C B B T T Jan Christopher Vogt / @cvogt https://github.com/cvogt/talk-2016-03-04 NESCALA

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,