crashsim an efficient algorithm for computing simrank
play

CrashSim: An Efficient Algorithm for Computing SimRank over Static - PowerPoint PPT Presentation

CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs Mo Li 1,2 , Farhana M. Choudhury 2 , Renata Borovica-Gajic 2 , Zhiqiong Wang 1 , Junchang Xin 1,* , Jianxin Li 3 1 Northeastern University, CN 2 University of


  1. CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs Mo Li 1,2 , Farhana M. Choudhury 2 , Renata Borovica-Gajic 2 , Zhiqiong Wang 1 , Junchang Xin 1,* , Jianxin Li 3 1 Northeastern University, CN 2 University of Melbourne, AU 3 Deakin University, AU April 22, 2020

  2. Outline Background Problem Definition • SimRank Overview • Preliminaries • Motivation • Problem Definition Our Approach Experiments and Conclusion • CrashSim Algorithm --- static graphs • Experimental Evaluation • CrashSim-T Algorithm --- temporal graphs • Conclusion 2

  3. Background • SimRank Overview • Motivation

  4. Background • Similarity assessment plays a vital role in our lives. Recommender System Citation Collaboration Graph Network 4

  5. Background • SimRank • Node-to-node measurement based on the topology of graphs (KDD’02) • Basic assumption • Two nodes will be similar if they are both highly relevant to similar nodes • Two Forms • Original definition (KDD’02) √c - walk (SIGMOD’16) • 5

  6. Background • Temporal Graph A A A B B B C D E C D E C D E Recommender System F G H F G H F G H • Temporal SimRank queries: threshold and trend 6

  7. Problem Definition • Preliminaries • Problem Definition

  8. Preliminaries SLING algorithm (SIGMOD’16) ProbeSim algorithm (VLDB’17) 8

  9. Problem Definition Problem (Temporal SimRank Queries) Given: !, #, [% & , % ' ] Return: node set Ω , such that the SimRank of # and each node * ∈ Ω continuously meet a certain query requirement during the entire query interval [% & , % ' ] Problem (Temporal SimRank Trend Query) Given: !, #, [% & , % ' ] Return: node set Ω , such that the SimRank of # and each node * ∈ Ω is continuously increasing (or decreasing) during the entire query interval [% & , % ' ] Problem (Temporal SimRank Threshold Query) Given: !, #, [% & , % ' ] , , Return: node set Ω , such that the SimRank of # and each node * ∈ Ω is greater than - during the entire query interval [% & , % ' ] 9

  10. Our Approach • CrashSim Algorithm --- static graphs • CrashSim-T Algorithm --- temporal graphs

  11. CrashSim Algorithm • Motivation • ProbeSim (VLDB’17) is the state-of-the-art algorithm for SimRak computation over static graph A B B A C D E C D E B C A D F G H C B F G H A C D E A D F G B C F G H E A D • Drawbacks • redundant computations the length of √" -walk determine the computation costs • 11

  12. CrashSim Algorithm • Key idea • Constrain the length of √" -walk to l max • A reverse reachable tree of source u with the limited length of √" -walk, l max • Still obtain SimRank estimators with the same guaranteed error bound of the ProbeSim Problem (Approximation Guarantee) Given: #, %, &, ' Return: ( %, ) such that |( %, ) − (,-(%, ))| ≤ & with at least 1 − ' probability 12

  13. CrashSim Algorithm A Level 0 A B B C Level 1 C D E E B D Level 2 F G H H A E B Level 3 ! 0,$ = 1 ) = 1×0.5 ! 1,' = ! 0,$ × 2 = 0.25 In the k -th trial, 6 . = ., 4, ', $ * ' ) = 1×0.5 7 8 A, C = ! 0, C + ! 1, D + ! 2, B + ! 3, A ! 1,. = ! 0,$ × 3 = 0.167 * . = 0 + 0 + 0.0417 + 0.0104 = 0.0521 ! 2,2 = 0.0625 , ! 2,' = 0.0417 , ! 2,4 = 0.0417 ! 3,5 = 0.0156 , ! 3,$ = 0.0104 ! 3,2 = 0.0104 , ! 3,' = 0.0104 13

  14. CrashSim Algorithm Time Complexity: 14

  15. CrashSim-T Algorithm • Two opportunities • Unnecessary to compute the SimRank between u and the candidate node set ! at each time instant • The size of node set ! can only gradually reduce over time • CrashSim naturally supports the computation of SimRank of the source u and a partial set of nodes. 15

  16. CrashSim-T Algorithm --- Delta Pruning • Affected area of a changed edge ! → # • The altered nodes in the reverse reachable tree of u $ %&' − 1 length reachable nodes of y • • Delta pruning: ignore the nodes of an unaffected area A A A B B Delete B C C D E C D E * → + E B D F G H F G H The reverse reachable tree of A remains unchanged. 16

  17. CrashSim-T Algorithm --- Difference Pruning • Related area: the ! "#$ length reverse reachable tree of u and v • Difference pruning: filter out those nodes whose related area is unchanged A E A A B B B H Add B C C D E C D E % → ' A D E B D F G H F G H The reverse reachable tree of A and E remains unchanged. 17

  18. CrashSim-T Algorithm • Main idea • Check whether the conditions of delta and difference pruning are satisfied • If so, disregard those nodes as part of the candidate node set • Invoke CrashSim algorithm to compute u and residual nodes • According to different query requirements to filter out unsatisfied nodes 18

  19. Experiments and Conclusion • Experimental Evaluation • Conclusion

  20. Experimental Evaluation • Datasets • Comparison baselines • SLING (SIGMOD’16), ProbeSim (VLDB’17), READS (VLDB’17) • Setting and metrics • ! varies from 0.0125, 0.025, 0.05 to 0.1 • "# = max ( ),+ − (-. ),+ + ∈ 0 7(9:)∩7(9=) • 1234-(-56 = >?@(9:,9=) 20

  21. Experimental Evaluation (a) AS-733 (c) Wiki-Vote (b) AS-Caidi (e) HepPh (d) HepTh 21

  22. Experimental Evaluation The impact of the query interval on the response time of the algorithms 22

  23. Conclusion • Propose CrashSim algorithm, an index-free algorithm for single-source and partial SimRank computation in static graphs • Introduce CrashSim-T --- an extension to CrashSim to solve SimRank queries over temporal graphs • Experiments show that both CrashSim and CrashSim-T outperform the state-of-the- art algorithms. 23

  24. Thanks. Mo Li limo_neucse@hotmail.com

Recommend


More recommend