Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel Platforms
Overview Computing similarity of nodes in two graphs Essentially ranking pairs of nodes Network similarity decomposition NSD Algorithm Sequential implementation, experiments and applications Parallel NSD-based computation of node similarity scores Algorithm, parallel implementation, experiments The alignment graph Parallel NSD Algorithm, parallel implementation, auction matching Large scale experiments Strong and weak scaling results Conclusions and future work Approximate Graph Operations on Parallel Platforms
Graph similarity in figures Protein-Protein Interaction (PPI) networks for 2 species Wikipedia categories and Library of Congress subject headings How similar are any two nodes of these networks? a a figures from M. Bayati, M.Gerritsen, D. Gleich, A. Saberi, and Y. Wang, Algorithms for Large, Sparse Network Alignment, ICDM 2009 Approximate Graph Operations on Parallel Platforms
An example of similarity of a graph with itself (self-similarity) Allow matching a node to itself Applying the similarity pipeline 1 2 3 4 5 6 7 8 9 10 presented here 1 1 2 2 ( i , j ) square denotes a 3 3 4 4 matching of node i to node j . 5 5 6 6 7 7 8 8 Example graph 9 9 10 10 1 2 3 4 5 6 7 8 9 10 9 1 Do not allow matching a node to 10 itself 2 5 3 1 2 3 4 5 6 7 8 9 10 7 1 1 2 2 8 3 3 6 4 4 5 5 6 6 4 7 7 8 8 9 9 10 10 1 2 3 4 5 6 7 8 9 10 Approximate Graph Operations on Parallel Platforms
Rank-inspired definition of similarity Node Ranking A node is important if it is linked by other important nodes Graph Similarity Two nodes are similar if they are linked by other similar node pairs V.D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. Van Dooren. A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching. SIAM Rev. , 46(4):647–666, 2004. R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection Proceedings of the National Academy of Sciences , 105(35):12763, 2008. Singh’s et al approach, IsoRank, is our focus, has been typically applied to undirected graphs Approximate Graph Operations on Parallel Platforms
Notation A , B the the adjacency matrices of input graphs G A , G B . A is A T normalized by columns, i.e. (˜ A ) ij = a ji / P n A ˜ i =1 a ji ; similarly for ˜ B . h = vec ( H ) a normalized vector where H ij are independently known similarity scores between node sets, i ∈ V B and j ∈ V A . vec ( · ) operation for building a vector from a matrix (stacking its columns); unvec ( · ) is the inverse operation. α : percentage of network data contribution in the algorithm. ˜ A ⊗ B = ˜ ˜ A ⊗ ˜ C = B Computed matrix X contains similarity scores: X ij entry denotes how “similar” nodes i ∈ V B and j ∈ V A are. Reminder Kronecker product A ⊗ B of two matrices: a 1 , 1 b 1 , 1 a 1 , 1 b 1 , 2 · · · a 1 , 2 b 1 , 1 a 1 , 2 b 1 , 2 · · · 2 3 a 1 , 1 b 2 , 1 a 1 , 1 b 2 , 2 a 1 , 2 b 2 , 1 a 1 , 2 b 2 , 2 6 7 2 a 1 , 1 B a 1 , 2 B · · · 3 6 . 7 ... 6 . 7 a 2 , 1 B a 2 , 2 B 6 7 . 6 7 6 7 A ⊗ B = 5 = 6 7 6 7 . a 2 , 1 b 1 , 1 a 2 , 1 b 1 , 2 . ... 4 6 7 . 6 a 2 , 1 b 2 , 1 a 2 , 1 b 2 , 2 7 . 6 7 6 7 . 4 5 . . Approximate Graph Operations on Parallel Platforms
IsoRank algorithm IsoRank iteration x ← α ˜ Cx + (1 − α ) h until convergence. A T + (1 − α ) H as the iteration kernel Alternatively X ← α ˜ BX ˜ Because AXB = unvec (( B T ⊗ A ) x ) (property of Kronecker products) MAT3 (triple matrix product implementation of IsoRank idea) Cx + (1 − α ) h , set x (0) = h In IsoRank kernel x ← α ˜ Expanding, after n steps n − 1 x ( n ) = (1 − α ) α k ˜ C k h + α n ˜ � C n h k =0 Alternatively X ( n ) = (1 − α ) � n − 1 k =0 α k ˜ A T ) k + α n ˜ B k H (˜ B n H (˜ A T ) n Similarity scores are sums of contributions from all k-hop neighbors (similarity score aggregation) Approximate Graph Operations on Parallel Platforms
An example of similarity score aggregation in IsoRank Let G A and G B with nodes { a , b , c , d , e , f , g } and { 1 , 2 , 3 , 4 , 5 , 6 } Suppose nodes ( b , 1), ( f , 4) and ( g , 6) pairs are somewhat similar ( H information) Sum the contributions of all 1-hop, 2-hop, 3-hop,... neighbors in the two networks (along paths like c − b − 1 − 2, c − d − f − 4 − 3 − 2, c − a − e − g − 6 − 5 − 4 − 2) with known or previously computed similarity) Approximate Graph Operations on Parallel Platforms
Decomposing graphs: The NSD idea In IsoRank, using H as the initial condition ( X (0) = H ), after n steps we get X ( n ) = (1 − α ) � n − 1 k =0 α k ˜ A T ) k + α n ˜ B k H (˜ B n H (˜ A T ) n Let H = uv T (1 component, i.e. 1 outer vector product). Two phases for computing X then: 1 u ( k ) = ˜ B k u and v ( k ) = ˜ A k v (preprocess/compute iterates) k =0 α k u ( k ) v ( k ) T + α n u ( n ) v ( n ) T (construct 2 X ( n ) = (1 − α ) � n − 1 X) This extends to the case H is approximated by s components (sum of s outer vector products): H ∼ � s i =1 w i z T i . NSD key points Instead of triple matrix products we compute sums of outer products of vectors These vectors in turn are sparse matrix-vector iterates than can be computed independently Approximate Graph Operations on Parallel Platforms
NSD-based similarity matrix construction Remarks 1: { Input: A , B , { w i , z i | i = 1 , . . . , s } , α and n , Output: X = X ( n ) } Decomposition happens along the 2: compute ˜ A , ˜ B { phase 1 } set of paths of succesively larger 3: for i = 0 to s do length k , however with increasing w (0) ← w i , z (0) ← z i 4: i i “damping” (because of the 5: for k = 0 to n do (1 − α ) α k factor with α ∈ [0 , 1]) w ( k ) ← ˜ Bw ( k − 1) 6: i i z ( k ) Az ( k − 1) The computation does not involve ← ˜ 7: i i either explicitly building the end for 8: product graph related ˜ C or zero X ( n ) { phase 2 start } 9: i computing triple matrix products for k = 0 to n − 1 do 10: A T at each step. of the form ˜ BX ˜ T X ( n ) ← X ( n ) + α k w ( k ) z ( k ) 11: i i i i Only computing outer vector end for 12: products at the end. X ( n ) ← 13: i T (Sparse) matrix-vector products (1 − α ) X ( n ) + α n w ( n ) z ( n ) i i i for the two graphs can be 14: end for computed quite independently 15: X ( n ) ← � s i =1 X ( n ) i Approximate Graph Operations on Parallel Platforms
Block diagram of the decomposition approach Input a network pair and its elemental similarities H as component vectors; 1 Preprocess/compute vector iterates 2 Construct similarity matrix X by summing outer products of vectors Produce a set of pairs (matches) of nodes from one network that are “most 3 similar” to nodes from the other: rows and columns of X as nodes of a weighted bipartite graph, X ij its weights. Matching algorithms used in experiments for this 3rd phase: Primal Dual Matching (PDM), Greedy Matching (1/2 approximation, GM), Hungarian, auction. In the sequel IsoRank refers to the implementation of the IsoRank idea followed by the application of Hungarian and PDM algorithms to resulting X (as available in Singh’s binary code) . Approximate Graph Operations on Parallel Platforms
Protein-Protein Interaction (PPI) networks (sequential) Species pair NSD MAT3 PDM GM IsoRank (secs) (secs) (secs) (secs) (secs) celeg-dmela 3.15 64.20 152.12 7.29 783.48 Species Nodes Edges celeg-hsapi 3.28 69.74 163.05 9.54 1209.28 celeg (worm) 2805 4572 celeg-scere 1.97 44.61 127.70 4.16 949.58 dmela (fly) 7518 25830 ecoli (bacterium) 1821 6849 dmela-ecoli 1.86 37.79 86.80 4.78 807.93 hpylo (bacterium) 706 1414 dmela-hsapi 8.61 211.19 590.16 28.10 7840.00 hsapi (human) 9633 36386 dmela-scere 4.79 131.22 182.91 12.97 4905.00 mmusc (mouse) 290 254 ecoli-hsapi 2.41 47.48 79.23 4.76 2029.56 scere (yeast) 5499 31898 ecoli-scere 1.49 35.86 69.88 2.60 1264.24 hsapi-scere 6.09 152.02 181.17 15.56 6714.00 We computed the similarity matrices X for various possible pairs of species using only PPI data (network data) α = 0 . 80, uniform initial conditions (outer product of suitably normalized 1 ’s for each pair), 20 iterations, 1 component. Finding 1-3 orders of magnitude speedup of NSD-based approaches compared to comparable MAT3-based (with PDM, GM) and IsoRank ones (no parallelization yet). G. Kollias, S. Mohammadi, and A. Grama. Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment. IEEE Transactions on Knowledge and Data Engineering , 2011. Approximate Graph Operations on Parallel Platforms
Recommend
More recommend