Efficient Batched Distance and Centrality Computation in Unweighted - PowerPoint PPT Presentation

Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs Manuel Then, Stephan Günnemann, Alfons Kemper, Thomas Neumann Technische Universität München Chair for Database Systems

Graph Centrality Goal : Find the most central vertices • Influencers in social networks • Critical routers in computer networks Centrality measures • Degree : degree centrality, PageRank • Distances : closeness centrality • Paths : betweenness centrality Challenges • Algorithmic complexity • Random data access • Redundant computation, hard to vectorize Manuel Then | Efficient Batched Distance and Centrality Computation 2

Challenges Visualized Unweighted closeness centrality build on BFSs Goal : Run multiple BFSs concurrently and share common traversals Manuel Then | Efficient Batched Distance and Centrality Computation 3

Background: Multi-Source BFS BFS traversals using bit operations ∀ v ∈ V: ∀ n ∈ neighbors(v): next[n] = visit[v] & ~seen[n] Used to win SIGMOD 2014 programming contest [1] Then et al., The More the Merrier: Efficient Multi-source Graph Traversal, VLDB 2015 [2] Kaufmann et al., Parallel Array-Based Single- and Multi-Source Breadth First Searches on Large Dense Graphs, EDBT 2017 Manuel Then | Efficient Batched Distance and Centrality Computation 4

Overview Motivation: Graph Centrality Background: MS-BFS Centrality in unweighted graphs Centrality in weighted graphs Evaluation Summary and Future Work Manuel Then | Efficient Batched Distance and Centrality Computation 5

Unweighted Closeness Centrality Distance-based centrality metric • Central vertices have a low average geodesic distance to all other vertices MS-BFS from all vertices • No need to store distances Efficient batch incrementer • Significantly improves the performance of counting discovered vertices Manuel Then | Efficient Batched Distance and Centrality Computation 6

Unweighted Betweenness Centrality Path-based centrality metric • Central vertices are part of many shortest paths Naïve computation very costly. We use Brandes’s algorithm Forward step can leverage MS-BFS • Batching improves locality • Allows vectorization of numeric computations Challenges : Backward step requires • Reverse MS-BFS • Vertex predecessor calculation [3] Brandes, A Faster Algorithm for Betweenness Centrality, Journal of Mathematical Sociology, 2001 Manuel Then | Efficient Batched Distance and Centrality Computation 7

Reverse MS-BFS and Vertex Predecessors Reverse BFS: traverse graph in inverse BFS order • Stacks unsuited for MS-BFS Reconstruct traversal order forward iteration frontiers Batched vertex predecessor computation Correctness proof and full batched betweenness centrality algorithm in the paper Manuel Then | Efficient Batched Distance and Centrality Computation 8

Batched Algorithm Execution Problem : MS-BFS cannot be used for distance computation in weighted graphs Batched Algorithm Execution • Run algorithm from multiple vertices at the same time • Synchronize algorithm executions • Share common computations and data accesses • Adapt memory layout Manuel Then | Efficient Batched Distance and Centrality Computation 10

Batched Algorithm Execution: Example Batched Bellman-Ford algorithm Weighted all pairs shortest path Non-batched execution Batched execution Batched algorithm execution • … improves temporal and spatial locality • … facilitates vectorized computation Manuel Then | Efficient Batched Distance and Centrality Computation 11

Batched Weighted Distances Comparison of common weighted distance algorithms: Kronecker, 5 weights Kronecker, 10 weights Kronecker, 100 weights ● ● ● ● ● ● 1 M ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 k ● ● ● ● ● ● ● ● ● ● ● ● Runtime (in milliseconds) ● ● ● ● ● ● Execution ● ● ● ● ● ● 100 ● ● ● ● ● ● ● Batched ● ● ● ● ● ● ● ● ● ● Non − batched LDBC, 5 weights LDBC, 10 weights LDBC, 100 weights Algorithm ● Bellman − Ford 1 M ● ● ● ● ● ● ● Dijkstra ● ● ● ● ● ● ● ● ● ● ● ● 10 k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● 10 k 1 M 10 k 1 M 10 k 1 M Graph size (number of vertices) Manuel Then | Efficient Batched Distance and Centrality Computation 12

Weighted Centralities Closeness Centrality • Batched execution allows vectorizing the CC computation from the distances Betweenness Centrality • Requires global distance ordering • Implicit predecessor computation • Vectorized numeric computations Manuel Then | Efficient Batched Distance and Centrality Computation 13

Evaluation: Setup Algorithms implemented as stand-alone programs • C++14, GCC 5.2.1 • No framework dependencies Synthetic datasets • LDBC Social Network friendships graph • Kronecker graph, edge factor 32 Real-world datasets • Citeseer (384k verts), DBLP (1.3M verts), Wikipedia (1.9M verts), and Hudong (3M verts) • KONECT repository Evaluated on dual Intel Xeon E5-2660 v2, 20x 2.2GHz, 256GB Manuel Then | Efficient Batched Distance and Centrality Computation 15

Evaluation: Number of Concurrent Executions Closeness Centrality, Unweighted Closeness Centrality, Weighted ● ● 10 ● Batched algorithm execution speedup ● ● 5 ● ● ● ● ● ● ● ● 2 Dataset ● ● LDBC 100 1 ● ● Kronecker S21 Citeseer Betweenness Centrality, Unweighted Betweenness Centrality, Weighted DBLP Hudong Wikipedia 10 5 ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● 1 1 4 8 16 32 64 128 256 1 4 8 16 32 64 128 256 Number of concurrent executions Manuel Then | Efficient Batched Distance and Centrality Computation 16

Evaluation: Graph Size Scalability LDBC, Unweighted LDBC, Weighted ● Batched algorithm execution speedup ● ● Algorithm ● ● ● Closeness Centrality ● ● ● 10 Betweenness Centrality vs. Brandes's BC ● ● 5 ● ● WeightCount ● ● ● ● 1 2 ● ● 10 1 10 k 100 k 1 M 10 k 100 k 1 M Graph size (number of vertices) Manuel Then | Efficient Batched Distance and Centrality Computation 17

Evaluation: Number of Edge Weights LDBC, Weighted ● ● Batched algorithm execution speedup ● ● ● 5 Algorithm ● Closeness Centrality ● ● ● ● ● Betweenness Centrality ● ● ● ● ● ● ● ● ● ● WeightCount 2 ● ● ● ● 5 ● 10 ● 100 1 10 k 100 k 1 M Graph size (number of vertices) Manuel Then | Efficient Batched Distance and Centrality Computation 18

Summary Batched algorithm execution • Shares common data accesses, • Avoids/vectorizes computations, and • Significantly reduces graph algorithm execution times Improved centrality computation performance • Unweighted by up to 20x (closeness) and 6x (betweenness) • Weighted by up to 7x (closeness) and 3x (betweenness) Details and all algorithms are listed in the paper Future work: Apply batched execution to further classes of algorithms Manuel Then | Efficient Batched Distance and Centrality Computation 19

Efficient Batched Distance and Centrality Computation in Unweighted - PowerPoint PPT Presentation

Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs Manuel Then, Stephan Gnnemann, Alfons Kemper, Thomas Neumann Technische Universitt Mnchen Chair for Database Systems Graph Centrality Goal : Find

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc Hoang , Matteo Pontecorvi,

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

REDEFINING CENTRALITY Redefining Centrality Overview - Regional Integration - Global and Local

Centrality Argimiro Arratia & R. Ferrer-i-Cancho Universitat Polit` ecnica de Catalunya Version

Centrality Social and Technological Networks Rik Sarkar University of Edinburgh, 2017.

Array Based Betweenness Centrality Eric Robinson Northeastern University MIT Lincoln Labs

Degree centrality Network Analysis in Python I Important nodes Which nodes are important?

Distance Computation on Boost.Geometry Vissarion Fisikopoulos FOSDEM 2018 Hello World! Distance

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Batched Non-interactive 2PC Payman Mohassel Mike Rosulek Visa Research OSU Secure Two-Party

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

Centrality, treeness and miscellaneous Social and Technological Networks Rik Sarkar University

Maximum Betweenness Centrality: Approximability and Tractable Cases Martin Fink and Joachim

Tracking and centrality in HI Sasha Milov (for the HI working group) Heavy Ion readiness

Centrality in nucleus-nucleus collisions A.Kurepin, A.Litvinenko, E.Litvinenko Institute for

CENTRALITY BUFALOTTA LOCATION SAXA RUBRA BUFALOTTA SANTA MARIA DELLA PIET GRA EST BYPASS

A Wisdom of the Crowd Approach to Forecasting Funded by the Intelligence Advanced Research

Extraordinary General Meeting 15 May 2018 Important Notice This presentation is for information

PCORI Dissemination and Implementation Funding Opportunities July 11, 2017 2pm 3pm EDT

Men ental Hea ealth F th First A t Aid Hos ospital C Coal oaliti ition Proj oject

Productivity and Convergence in Developing Countries: The Role of Imported Inputs Marcel P.

Lower-Stretch Spanning Trees Presenter: Yajun Wang COMP670P 1-1 Introduction Graph Embedding

ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop

Nonresponse Bias J. Michael Brick, Westat Roger Tourangeau, Westat Adaptive Survey Design

Efficient Batched Distance and Centrality Computation in Unweighted - PowerPoint PPT Presentation

Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs Manuel Then, Stephan Gnnemann, Alfons Kemper, Thomas Neumann Technische Universitt Mnchen Chair for Database Systems Graph Centrality Goal : Find

A Round-Efficient Distributed Betweenness Centrality Algorithm Loc Hoang , Matteo Pontecorvi,

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

REDEFINING CENTRALITY Redefining Centrality Overview - Regional Integration - Global and Local

Centrality Argimiro Arratia &amp; R. Ferrer-i-Cancho Universitat Polit` ecnica de Catalunya Version

Centrality Social and Technological Networks Rik Sarkar University of Edinburgh, 2017.

Array Based Betweenness Centrality Eric Robinson Northeastern University MIT Lincoln Labs

Degree centrality Network Analysis in Python I Important nodes Which nodes are important?

Distance Computation on Boost.Geometry Vissarion Fisikopoulos FOSDEM 2018 Hello World! Distance

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Batched Non-interactive 2PC Payman Mohassel Mike Rosulek Visa Research OSU Secure Two-Party

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

Centrality, treeness and miscellaneous Social and Technological Networks Rik Sarkar University

Maximum Betweenness Centrality: Approximability and Tractable Cases Martin Fink and Joachim

Tracking and centrality in HI Sasha Milov (for the HI working group) Heavy Ion readiness

Centrality in nucleus-nucleus collisions A.Kurepin, A.Litvinenko, E.Litvinenko Institute for

CENTRALITY BUFALOTTA LOCATION SAXA RUBRA BUFALOTTA SANTA MARIA DELLA PIET GRA EST BYPASS

A Wisdom of the Crowd Approach to Forecasting Funded by the Intelligence Advanced Research

Extraordinary General Meeting 15 May 2018 Important Notice This presentation is for information

PCORI Dissemination and Implementation Funding Opportunities July 11, 2017 2pm 3pm EDT

Men ental Hea ealth F th First A t Aid Hos ospital C Coal oaliti ition Proj oject

Productivity and Convergence in Developing Countries: The Role of Imported Inputs Marcel P.

Lower-Stretch Spanning Trees Presenter: Yajun Wang COMP670P 1-1 Introduction Graph Embedding

ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop

Nonresponse Bias J. Michael Brick, Westat Roger Tourangeau, Westat Adaptive Survey Design

Centrality Argimiro Arratia & R. Ferrer-i-Cancho Universitat Polit` ecnica de Catalunya Version