Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiovì Ambuj Singh Department of Computer Science University of California Santa Barbara
Traffic networks ICDM 2011 2 Images from http://www.dot.ca.gov
Transformation to a Dynamic Network ICDM 2011 3 Images from http://www.dot.ca.gov
Temporal subgraph scoring ICDM 2011 4 Images from http://www.dot.ca.gov
Various Application Domains ICDM 2011 5
Problem definition • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Temporal subgraph o Connected o Contiguous in time o Score is the sum of scores of involved edges Values can also be in nodes instead of edges ICDM 2011 6
Problem definition • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Temporal subgraph o Connected o Contiguous in time o Score is the sum of scores of involved edges Values can also be in nodes instead of edges ICDM 2011 7
Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 8
Previous Work • Dynamic graph mining o Evolutionary clustering [Lin 08, Kim 09, Yang 11, Sun 07] o Pattern mining [Lin 09, Oshino 10, McGlohon 07] o Anomaly detection [Abello 10, Akoglu 10] • Static graphs: Prize collecting steiner tree (PCST) [Lee 96, Johnson 00, Ljubic 05, Dittrich 08] ICDM 2011 9
Complexity • HDS is NP-hard o Reduction from Thumbnail Rectilinear Steiner Tree [Ganley 95] • The problem remains NP-hard o For one time slice o For a simple {-1,1} scoring ICDM 2011 10
Naive solution • Consider all time intervals o Transform HDS to PCST o Solve PCST • Return best solution • Complexity: O(t 2 |V| 2 log|V|) o We have to enumerate all sub-intervals o We have to apply a super-quadratic heuristic for PCST (such as [Johnson 2000] ) • Can we filter unfeasible solutions fast? o O(t log 2 (t) |E|) filtering ICDM 2011 11
Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 12
A better solution: Basic • For a fixed time slice: o Prune by structure upper bounds (UB) o Use a fast and accurate heuristic TopDown • Filtering solution • Obtain a lower bound • Filter every interval based on UB • Verify unfiltered intervals • Filtering: O( t 2 |E|) • Can we filter multiple intervals at a time? ICDM 2011 13
Grouping similar intervals Time • Combine overlapping intervals into groups. • High overlap: similar solutions • Filter interval groups as a whole, without considering individual members ICDM 2011 14
Groups of minimum overlap • Group intervals with common starting point • Ensure minimum overlap alpha (0.5 in the example) ICDM 2011 15
Groups of minimum overlap • Group intervals with common starting point • Ensure minimum overlap alpha (0.5 in the example) • A total of O(t log(t)) groups, when alpha < 1 ICDM 2011 16
MEDEN: filter and verify ICDM 2011 17
MEDEN: filter and verify ICDM 2011 18
MEDEN: filter and verify ICDM 2011 19
MEDEN: filter and verify ICDM 2011 20
MEDEN: filter and verify ICDM 2011 21
Filter whole groups • We define a Dominating Graph (DG) for each group o Edge weights are maximum over all group members o Solution in DG dominates solution in any member • Compose a DG: O(log(t)) • Time to build index: O(t |E|) • All dominant graphs composition time: O(t log 2 (t) |E|) ICDM 2011 22
Putting things together: MEDEN • Steps of MEDEN: • Obtain a LB to the solution • Filter groups • Filter members • Verify using TopDown • Running time • Filtering takes O(t log 2 (t) |E|) • Verification takes O(|C||E|log|V|) , where |C| is the number of not pruned intervals • |C| is small (linear in t in our experiments) ICDM 2011 23
Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 24
Scalability with time ICDM 2011 25
Graph size and overlap for grouping ICDM 2011 26
Conclusion • We are the first to introduce the Heaviest Dynamic Subgraph ( HDS) • Our approach MEDEN scales to real world graphs and outperforms a basic approach by an order of magnitude due to interval grouping • Future directions • Extend to scalable top-k • Allow smoothly changing patterns ICDM 2011 27
THANK YOU Questions? ICDM 2011 28
NP-hardness Reduction from Thumbnail Rectilinear Steiner Tree -1 -1 -1 -1 -1 -1 1 1 4n -1 4n -1 -1 1 1 -1 1 4n 1 1 4n 1 ICDM 2011 29
Upper Bounds UB sop UB str ICDM 2011 30
TopDown heuristic ICDM 2011 31
Results - Twitter Twitter sub-network • Nodes: 2605 • Edges: 14871 • Slices: 204 • Resolution: 1 day • Cutoff: cosine similarity 0.004 ICDM 2011 32
Can we improve the evaluation in time? • Still O(t 2 ) sub-intervals need to be considered • Infeasible on long time-spans • Combine overlapping intervals into groups. • Ensure: 1. High overlap of intervals in a group • Sub-quadratic number of groups • Sub-quadratic time to compute bounds for groups • Prune groups as a whole, without considering individual members ICDM 2011 33
Some References • Hwang, Ju-Won; Lee, Young-Seol; Cho, Sung-Bae; , "Structure evolution of dynamic Bayesian network for traffic accident detection," Evolutionary Computation (CEC), 2011 IEEE Congress on , vol., no., pp.1655-1671, 5-8 June 2011 • Borgwardt, K.M.; Kriegel, H.-P.; Wackersreuther, P.; , "Pattern Mining in Frequent Dynamic Subgraphs," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.818-822, 18-22 Dec. 2006 • Johnson DS, Minkoff M, Phillips S. The Prize Collecting Steiner Tree Problem : Theory and Practice. SODA . 2000. • Kwon J, Murphy K. Modeling Freeway Traffic with Coupled HMMs. 2004 • Berlingerio M, Bonchi F. Mining graph evolution rules. Machine Learning and Knowledge Discovery in Databases . 2009:115-130. • Stoev SA, Michailidis G, Vaughan J. Global Modeling and Prediction of Computer Network Traffic. 2009:1-32. • Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM. Frequent Subgraph Discovery in Dynamic Networks. Developmental Biology . 2010:155-162. • Macropol K, Singh AK: Content-based Modeling and Prediction of Information Dissemination ASONAM 2011 ICDM 2011 34
Recommend
More recommend