mining heavy subgraphs in time evolving networks
play

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov - PowerPoint PPT Presentation

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiov Ambuj Singh Department of Computer Science University of California Santa Barbara Traffic networks ICDM 2011 2 Images from


  1. Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiovì Ambuj Singh Department of Computer Science University of California Santa Barbara

  2. Traffic networks ICDM 2011 2 Images from http://www.dot.ca.gov

  3. Transformation to a Dynamic Network ICDM 2011 3 Images from http://www.dot.ca.gov

  4. Temporal subgraph scoring ICDM 2011 4 Images from http://www.dot.ca.gov

  5. Various Application Domains ICDM 2011 5

  6. Problem definition • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Temporal subgraph o Connected o Contiguous in time o Score is the sum of scores of involved edges Values can also be in nodes instead of edges ICDM 2011 6

  7. Problem definition • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Temporal subgraph o Connected o Contiguous in time o Score is the sum of scores of involved edges Values can also be in nodes instead of edges ICDM 2011 7

  8. Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 8

  9. Previous Work • Dynamic graph mining o Evolutionary clustering [Lin 08, Kim 09, Yang 11, Sun 07] o Pattern mining [Lin 09, Oshino 10, McGlohon 07] o Anomaly detection [Abello 10, Akoglu 10] • Static graphs: Prize collecting steiner tree (PCST) [Lee 96, Johnson 00, Ljubic 05, Dittrich 08] ICDM 2011 9

  10. Complexity • HDS is NP-hard o Reduction from Thumbnail Rectilinear Steiner Tree [Ganley 95] • The problem remains NP-hard o For one time slice o For a simple {-1,1} scoring ICDM 2011 10

  11. Naive solution • Consider all time intervals o Transform HDS to PCST o Solve PCST • Return best solution • Complexity: O(t 2 |V| 2 log|V|) o We have to enumerate all sub-intervals o We have to apply a super-quadratic heuristic for PCST (such as [Johnson 2000] ) • Can we filter unfeasible solutions fast? o O(t log 2 (t) |E|) filtering ICDM 2011 11

  12. Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 12

  13. A better solution: Basic • For a fixed time slice: o Prune by structure upper bounds (UB) o Use a fast and accurate heuristic TopDown • Filtering solution • Obtain a lower bound • Filter every interval based on UB • Verify unfiltered intervals • Filtering: O( t 2 |E|) • Can we filter multiple intervals at a time? ICDM 2011 13

  14. Grouping similar intervals Time • Combine overlapping intervals into groups. • High overlap: similar solutions • Filter interval groups as a whole, without considering individual members ICDM 2011 14

  15. Groups of minimum overlap • Group intervals with common starting point • Ensure minimum overlap alpha (0.5 in the example) ICDM 2011 15

  16. Groups of minimum overlap • Group intervals with common starting point • Ensure minimum overlap alpha (0.5 in the example) • A total of O(t log(t)) groups, when alpha < 1 ICDM 2011 16

  17. MEDEN: filter and verify ICDM 2011 17

  18. MEDEN: filter and verify ICDM 2011 18

  19. MEDEN: filter and verify ICDM 2011 19

  20. MEDEN: filter and verify ICDM 2011 20

  21. MEDEN: filter and verify ICDM 2011 21

  22. Filter whole groups • We define a Dominating Graph (DG) for each group o Edge weights are maximum over all group members o Solution in DG dominates solution in any member • Compose a DG: O(log(t)) • Time to build index: O(t |E|) • All dominant graphs composition time: O(t log 2 (t) |E|) ICDM 2011 22

  23. Putting things together: MEDEN • Steps of MEDEN: • Obtain a LB to the solution • Filter groups • Filter members • Verify using TopDown • Running time • Filtering takes O(t log 2 (t) |E|) • Verification takes O(|C||E|log|V|) , where |C| is the number of not pruned intervals • |C| is small (linear in t in our experiments) ICDM 2011 23

  24. Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 24

  25. Scalability with time ICDM 2011 25

  26. Graph size and overlap for grouping ICDM 2011 26

  27. Conclusion • We are the first to introduce the Heaviest Dynamic Subgraph ( HDS) • Our approach MEDEN scales to real world graphs and outperforms a basic approach by an order of magnitude due to interval grouping • Future directions • Extend to scalable top-k • Allow smoothly changing patterns ICDM 2011 27

  28. THANK YOU Questions? ICDM 2011 28

  29. NP-hardness Reduction from Thumbnail Rectilinear Steiner Tree -1 -1 -1 -1 -1 -1 1 1 4n -1 4n -1 -1 1 1 -1 1 4n 1 1 4n 1 ICDM 2011 29

  30. Upper Bounds UB sop UB str ICDM 2011 30

  31. TopDown heuristic ICDM 2011 31

  32. Results - Twitter Twitter sub-network • Nodes: 2605 • Edges: 14871 • Slices: 204 • Resolution: 1 day • Cutoff: cosine similarity 0.004 ICDM 2011 32

  33. Can we improve the evaluation in time? • Still O(t 2 ) sub-intervals need to be considered • Infeasible on long time-spans • Combine overlapping intervals into groups. • Ensure: 1. High overlap of intervals in a group • Sub-quadratic number of groups • Sub-quadratic time to compute bounds for groups • Prune groups as a whole, without considering individual members ICDM 2011 33

  34. Some References • Hwang, Ju-Won; Lee, Young-Seol; Cho, Sung-Bae; , "Structure evolution of dynamic Bayesian network for traffic accident detection," Evolutionary Computation (CEC), 2011 IEEE Congress on , vol., no., pp.1655-1671, 5-8 June 2011 • Borgwardt, K.M.; Kriegel, H.-P.; Wackersreuther, P.; , "Pattern Mining in Frequent Dynamic Subgraphs," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.818-822, 18-22 Dec. 2006 • Johnson DS, Minkoff M, Phillips S. The Prize Collecting Steiner Tree Problem : Theory and Practice. SODA . 2000. • Kwon J, Murphy K. Modeling Freeway Traffic with Coupled HMMs. 2004 • Berlingerio M, Bonchi F. Mining graph evolution rules. Machine Learning and Knowledge Discovery in Databases . 2009:115-130. • Stoev SA, Michailidis G, Vaughan J. Global Modeling and Prediction of Computer Network Traffic. 2009:1-32. • Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM. Frequent Subgraph Discovery in Dynamic Networks. Developmental Biology . 2010:155-162. • Macropol K, Singh AK: Content-based Modeling and Prediction of Information Dissemination ASONAM 2011 ICDM 2011 34

Recommend


More recommend