Approximate Graph Operations on Parallel Platforms Approximate - PowerPoint PPT Presentation

Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel Platforms

Overview Computing similarity of nodes in two graphs Essentially ranking pairs of nodes Network similarity decomposition NSD Algorithm Sequential implementation, experiments and applications Parallel NSD-based computation of node similarity scores Algorithm, parallel implementation, experiments The alignment graph Parallel NSD Algorithm, parallel implementation, auction matching Large scale experiments Strong and weak scaling results Conclusions and future work Approximate Graph Operations on Parallel Platforms

Graph similarity in figures Protein-Protein Interaction (PPI) networks for 2 species Wikipedia categories and Library of Congress subject headings How similar are any two nodes of these networks? a a figures from M. Bayati, M.Gerritsen, D. Gleich, A. Saberi, and Y. Wang, Algorithms for Large, Sparse Network Alignment, ICDM 2009 Approximate Graph Operations on Parallel Platforms

An example of similarity of a graph with itself (self-similarity) Allow matching a node to itself Applying the similarity pipeline 1 2 3 4 5 6 7 8 9 10 presented here 1 1 2 2 ( i , j ) square denotes a 3 3 4 4 matching of node i to node j . 5 5 6 6 7 7 8 8 Example graph 9 9 10 10 1 2 3 4 5 6 7 8 9 10 9 1 Do not allow matching a node to 10 itself 2 5 3 1 2 3 4 5 6 7 8 9 10 7 1 1 2 2 8 3 3 6 4 4 5 5 6 6 4 7 7 8 8 9 9 10 10 1 2 3 4 5 6 7 8 9 10 Approximate Graph Operations on Parallel Platforms

Rank-inspired definition of similarity Node Ranking A node is important if it is linked by other important nodes Graph Similarity Two nodes are similar if they are linked by other similar node pairs V.D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. Van Dooren. A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching. SIAM Rev. , 46(4):647–666, 2004. R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection Proceedings of the National Academy of Sciences , 105(35):12763, 2008. Singh’s et al approach, IsoRank, is our focus, has been typically applied to undirected graphs Approximate Graph Operations on Parallel Platforms

Notation A , B the the adjacency matrices of input graphs G A , G B . A is A T normalized by columns, i.e. (˜ A ) ij = a ji / P n A ˜ i =1 a ji ; similarly for ˜ B . h = vec ( H ) a normalized vector where H ij are independently known similarity scores between node sets, i ∈ V B and j ∈ V A . vec ( · ) operation for building a vector from a matrix (stacking its columns); unvec ( · ) is the inverse operation. α : percentage of network data contribution in the algorithm. ˜ A ⊗ B = ˜ ˜ A ⊗ ˜ C = B Computed matrix X contains similarity scores: X ij entry denotes how “similar” nodes i ∈ V B and j ∈ V A are. Reminder Kronecker product A ⊗ B of two matrices: a 1 , 1 b 1 , 1 a 1 , 1 b 1 , 2 · · · a 1 , 2 b 1 , 1 a 1 , 2 b 1 , 2 · · · 2 3 a 1 , 1 b 2 , 1 a 1 , 1 b 2 , 2 a 1 , 2 b 2 , 1 a 1 , 2 b 2 , 2 6 7 2 a 1 , 1 B a 1 , 2 B · · · 3 6 . 7 ... 6 . 7 a 2 , 1 B a 2 , 2 B 6 7 . 6 7 6 7 A ⊗ B = 5 = 6 7 6 7 . a 2 , 1 b 1 , 1 a 2 , 1 b 1 , 2 . ... 4 6 7 . 6 a 2 , 1 b 2 , 1 a 2 , 1 b 2 , 2 7 . 6 7 6 7 . 4 5 . . Approximate Graph Operations on Parallel Platforms

IsoRank algorithm IsoRank iteration x ← α ˜ Cx + (1 − α ) h until convergence. A T + (1 − α ) H as the iteration kernel Alternatively X ← α ˜ BX ˜ Because AXB = unvec (( B T ⊗ A ) x ) (property of Kronecker products) MAT3 (triple matrix product implementation of IsoRank idea) Cx + (1 − α ) h , set x (0) = h In IsoRank kernel x ← α ˜ Expanding, after n steps n − 1 x ( n ) = (1 − α ) α k ˜ C k h + α n ˜ � C n h k =0 Alternatively X ( n ) = (1 − α ) � n − 1 k =0 α k ˜ A T ) k + α n ˜ B k H (˜ B n H (˜ A T ) n Similarity scores are sums of contributions from all k-hop neighbors (similarity score aggregation) Approximate Graph Operations on Parallel Platforms

An example of similarity score aggregation in IsoRank Let G A and G B with nodes { a , b , c , d , e , f , g } and { 1 , 2 , 3 , 4 , 5 , 6 } Suppose nodes ( b , 1), ( f , 4) and ( g , 6) pairs are somewhat similar ( H information) Sum the contributions of all 1-hop, 2-hop, 3-hop,... neighbors in the two networks (along paths like c − b − 1 − 2, c − d − f − 4 − 3 − 2, c − a − e − g − 6 − 5 − 4 − 2) with known or previously computed similarity) Approximate Graph Operations on Parallel Platforms

Decomposing graphs: The NSD idea In IsoRank, using H as the initial condition ( X (0) = H ), after n steps we get X ( n ) = (1 − α ) � n − 1 k =0 α k ˜ A T ) k + α n ˜ B k H (˜ B n H (˜ A T ) n Let H = uv T (1 component, i.e. 1 outer vector product). Two phases for computing X then: 1 u ( k ) = ˜ B k u and v ( k ) = ˜ A k v (preprocess/compute iterates) k =0 α k u ( k ) v ( k ) T + α n u ( n ) v ( n ) T (construct 2 X ( n ) = (1 − α ) � n − 1 X) This extends to the case H is approximated by s components (sum of s outer vector products): H ∼ � s i =1 w i z T i . NSD key points Instead of triple matrix products we compute sums of outer products of vectors These vectors in turn are sparse matrix-vector iterates than can be computed independently Approximate Graph Operations on Parallel Platforms

NSD-based similarity matrix construction Remarks 1: { Input: A , B , { w i , z i | i = 1 , . . . , s } , α and n , Output: X = X ( n ) } Decomposition happens along the 2: compute ˜ A , ˜ B { phase 1 } set of paths of succesively larger 3: for i = 0 to s do length k , however with increasing w (0) ← w i , z (0) ← z i 4: i i “damping” (because of the 5: for k = 0 to n do (1 − α ) α k factor with α ∈ [0 , 1]) w ( k ) ← ˜ Bw ( k − 1) 6: i i z ( k ) Az ( k − 1) The computation does not involve ← ˜ 7: i i either explicitly building the end for 8: product graph related ˜ C or zero X ( n ) { phase 2 start } 9: i computing triple matrix products for k = 0 to n − 1 do 10: A T at each step. of the form ˜ BX ˜ T X ( n ) ← X ( n ) + α k w ( k ) z ( k ) 11: i i i i Only computing outer vector end for 12: products at the end. X ( n ) ← 13: i T (Sparse) matrix-vector products (1 − α ) X ( n ) + α n w ( n ) z ( n ) i i i for the two graphs can be 14: end for computed quite independently 15: X ( n ) ← � s i =1 X ( n ) i Approximate Graph Operations on Parallel Platforms

Block diagram of the decomposition approach Input a network pair and its elemental similarities H as component vectors; 1 Preprocess/compute vector iterates 2 Construct similarity matrix X by summing outer products of vectors Produce a set of pairs (matches) of nodes from one network that are “most 3 similar” to nodes from the other: rows and columns of X as nodes of a weighted bipartite graph, X ij its weights. Matching algorithms used in experiments for this 3rd phase: Primal Dual Matching (PDM), Greedy Matching (1/2 approximation, GM), Hungarian, auction. In the sequel IsoRank refers to the implementation of the IsoRank idea followed by the application of Hungarian and PDM algorithms to resulting X (as available in Singh’s binary code) . Approximate Graph Operations on Parallel Platforms

Protein-Protein Interaction (PPI) networks (sequential) Species pair NSD MAT3 PDM GM IsoRank (secs) (secs) (secs) (secs) (secs) celeg-dmela 3.15 64.20 152.12 7.29 783.48 Species Nodes Edges celeg-hsapi 3.28 69.74 163.05 9.54 1209.28 celeg (worm) 2805 4572 celeg-scere 1.97 44.61 127.70 4.16 949.58 dmela (fly) 7518 25830 ecoli (bacterium) 1821 6849 dmela-ecoli 1.86 37.79 86.80 4.78 807.93 hpylo (bacterium) 706 1414 dmela-hsapi 8.61 211.19 590.16 28.10 7840.00 hsapi (human) 9633 36386 dmela-scere 4.79 131.22 182.91 12.97 4905.00 mmusc (mouse) 290 254 ecoli-hsapi 2.41 47.48 79.23 4.76 2029.56 scere (yeast) 5499 31898 ecoli-scere 1.49 35.86 69.88 2.60 1264.24 hsapi-scere 6.09 152.02 181.17 15.56 6714.00 We computed the similarity matrices X for various possible pairs of species using only PPI data (network data) α = 0 . 80, uniform initial conditions (outer product of suitably normalized 1 ’s for each pair), 20 iterations, 1 component. Finding 1-3 orders of magnitude speedup of NSD-based approaches compared to comparable MAT3-based (with PDM, GM) and IsoRank ones (no parallelization yet). G. Kollias, S. Mohammadi, and A. Grama. Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment. IEEE Transactions on Knowledge and Data Engineering , 2011. Approximate Graph Operations on Parallel Platforms

Approximate Graph Operations on Parallel Platforms Approximate - PowerPoint PPT Presentation

Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel Platforms Overview Computing similarity of nodes in two graphs Essentially ranking pairs of nodes Network similarity decomposition NSD Algorithm

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Overview Parallel computing platforms Approaches to building parallel computers

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

WILL YOU EAT OR BE EATEN ? Platforms are as old as trains 2 Sometimes platforms go wrong 3

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

PARALLEL ALGORITHM DESIGN FOR PARALLEL PLATFORMS 2 1 31 10 2015 OVERVIEW Task and

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Fairness issues in new large scale parallel platforms. Denis TRYSTRAM LIG Universit de

You call it Data Lake; we call it Data Historian Naghman Waheed Data Platforms Lead Brian

Platforms Where is the market going? Adviser lead Platforms: Current state of affairs c.

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Lie to me Demystifying Spark accumulators SergeyZhemzhitsky s.zhemzhitsky@cleverdata.ru DMP

Randomized comparison of an ultrathin strut cobalt-chromium biodegradable polymer

Quarterly Supplementary Materials February 16, 2017 Revenue Structure Yandex ex Revenue ue

HPC & Big Data Analytics in Luxembourg Application to Efficient Parallel Coupling of CFD-DEM

UMBC A B M A L F T U M B C I O M Y O T R 1 (November 26, 2000 11:15 pm) I E

Spherical and hyperbolic 2-spheres with cone singularities Workshop Hyperbolic geometry and

Analysis of a Parallel 3D MD application Russian-German School on High-Performance Computer

New class of limited-memory variationally-derived variable metric methods 1 Jan Vl cek,

Approximate Graph Operations on Parallel Platforms Approximate - PowerPoint PPT Presentation

Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel Platforms Overview Computing similarity of nodes in two graphs Essentially ranking pairs of nodes Network similarity decomposition NSD Algorithm

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Overview Parallel computing platforms Approaches to building parallel computers

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

WILL YOU EAT OR BE EATEN ? Platforms are as old as trains 2 Sometimes platforms go wrong 3

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

PARALLEL ALGORITHM DESIGN FOR PARALLEL PLATFORMS 2 1 31 10 2015 OVERVIEW Task and

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Fairness issues in new large scale parallel platforms. Denis TRYSTRAM LIG Universit de

You call it Data Lake; we call it Data Historian Naghman Waheed Data Platforms Lead Brian

Platforms Where is the market going? Adviser lead Platforms: Current state of affairs c.

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Lie to me Demystifying Spark accumulators SergeyZhemzhitsky s.zhemzhitsky@cleverdata.ru DMP

Randomized comparison of an ultrathin strut cobalt-chromium biodegradable polymer

Quarterly Supplementary Materials February 16, 2017 Revenue Structure Yandex ex Revenue ue

HPC &amp; Big Data Analytics in Luxembourg Application to Efficient Parallel Coupling of CFD-DEM

UMBC A B M A L F T U M B C I O M Y O T R 1 (November 26, 2000 11:15 pm) I E

Spherical and hyperbolic 2-spheres with cone singularities Workshop Hyperbolic geometry and

Analysis of a Parallel 3D MD application Russian-German School on High-Performance Computer

New class of limited-memory variationally-derived variable metric methods 1 Jan Vl cek,

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

HPC & Big Data Analytics in Luxembourg Application to Efficient Parallel Coupling of CFD-DEM