mining large dynamic graphs and tensors
play

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Student - PowerPoint PPT Presentation

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Student (kijungs@cs.cmu.edu) Thesis Committee Prof. Christos Faloutsos (Chair) Prof. Tom M. Mitchell Prof. Leman Akoglu Prof. Philip S. Yu Mining Large Dynamic Graphs


  1. Goal of Sampling Step β€’ to maximize discovering probability π‘ž 𝑣𝑀π‘₯ Theorem. Variance of our estimate: Var βˆ† β‰ˆ Οƒ (𝑣,𝑀,π‘₯) (1/π‘ž 𝑣𝑀π‘₯ βˆ’ 1) True Count Theorem. Unbiasedness of our estimate : Bias[βˆ†] = Exp βˆ† βˆ’ True count = 0 πΉπ‘‘π‘’π‘—π‘›π‘π‘’π‘—π‘π‘œ 𝐹𝑠𝑠𝑝𝑠 = 𝐢𝑗𝑏𝑑 + π‘Šπ‘π‘ π‘—π‘π‘œπ‘‘π‘“ 0 Completed / Proposed T1.1 / T1.2 / T1.3 32/106

  2. Increasing Discovering Prob. β€œHow can we increase discovering probabilities of triangles?” β€’ Recall Temporal Locality: β—¦ new edges are more likely to form β—¦ triangles with recent edges β—¦ than with old edges β€’ Waiting-Room Sampling (WRS) β—¦ treats recent edges better than old edges β—¦ to exploit temporal locality Completed / Proposed T1.1 / T1.2 / T1.3 33/106

  3. Waiting-Room Sampling (WRS) β€’ Divides memory space into two parts β—¦ Waiting Room: latest edges are always stored β—¦ Reservoir: the remaining edges are sampled 𝑓 80 New edge Waiting Room (FIFO) Reservoir (Random Replace) 𝑓 79 𝑓 78 𝑓 77 𝑓 76 𝑓 61 𝑓 7 𝑓 18 𝑓 25 𝑓 40 𝑓 1 𝑓 28 𝛽 % of budget 100 βˆ’ 𝛽 % of budget Completed / Proposed T1.1 / T1.2 / T1.3 34/106

  4. WRS: Sampling Steps (Step 1) 𝒇 πŸ—πŸ New edge Waiting Room (FIFO) Reservoir (Random Replace) 𝑓 79 𝑓 78 𝑓 77 𝒇 πŸ–πŸ• 𝑓 61 𝑓 7 𝑓 18 𝑓 25 𝑓 40 𝑓 1 𝑓 28 𝒇 πŸ–πŸ• Popped edge Waiting Room (FIFO) Reservoir (Random Replace) 𝒇 πŸ—πŸ 𝑓 79 𝑓 78 𝑓 77 𝑓 61 𝑓 7 𝑓 18 𝑓 25 𝑓 40 𝑓 1 𝑓 28 Completed / Proposed T1.1 / T1.2 / T1.3 35/106

  5. WRS: Sampling Steps (Step 2) 𝒇 πŸ–πŸ• Popped edge Waiting Room (FIFO) Reservoir (Random Replace) 𝑓 80 𝑓 79 𝑓 78 𝑓 77 𝑓 61 𝑓 7 𝑓 18 𝑓 25 𝑓 40 𝑓 1 𝑓 28 replace! 𝑓 61 𝑓 7 𝑓 18 𝑓 25 𝒇 πŸ–πŸ• 𝑓 1 𝑓 28 store or or 𝑓 61 𝑓 7 𝑓 18 𝑓 25 𝑓 40 𝑓 1 𝑓 28 discard Completed / Proposed T1.1 / T1.2 / T1.3 36/106

  6. Summary of Algorithm Waiting-Room Sampling! (1) Arrival Step (2) Discovery Step (3) Sampling Step discover! 𝑣 βˆ’ 𝑀 𝑣 βˆ’ 𝑀 new edge 𝑣 βˆ’ 𝑀 𝑦 𝑣 𝑀 𝑣 𝑣 𝑀 𝑀 𝑣 𝑣 𝑀 𝑀 𝑣 𝑀 | | | | | | | | | | | | 𝑦 𝑦 𝑦 𝑦 𝑦 𝑧 𝑦 𝑧 𝑧 𝑧 𝑀 𝑧 memory βˆ†β† βˆ† + 1/π‘ž 𝑣𝑀𝑦 Completed / Proposed T1.1 / T1.2 / T1.3 37/106

  7. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β–ͺ T1.1 Waiting-Room Sampling β—¦ Temporal Pattern β—¦ Algorithm β—¦ Experiments << β–ͺ T1.2-T1.3 Related Completed Work β—¦ T2. Anomaly Detection β—¦ T3. Behavior Modeling β€’ Proposed Work β€’ Conclusion Completed / Proposed T1.1 / T1.2 / T1.3 38/106

  8. Experimental Results: Accuracy β€’ Datasets: β€’ WRS is most accurate (reduces error up to πŸ“πŸ–% ) Completed / Proposed T1.1 / T1.2 / T1.3 39/106

  9. Discovering Probability β€’ WRS increases discovering probability π‘ž 𝑣𝑀π‘₯ β€’ WRS discovers up to 3 Γ— more triangles better WRS Triest-IMPR MASCOT Completed / Proposed T1.1 / T1.2 / T1.3 40/106

  10. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β–ͺ T1.1 Waiting-Room Sampling β–ͺ T1.2-T1.3 Related Completed Work << β—¦ T2. Anomaly Detection β—¦ T3. Behavior Modeling β€’ Proposed Work β€’ Conclusion Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 41/106

  11. T1.2 Distributed Counting of Triangles β€’ Goal: to utilize multiple machines for triangle counting in a graph stream? DiSLR [submitted to KDD] Tri-Fly [PAKDD18] Sources Workers Aggregators Sources Workers Aggregators Broadcast Shuffle Multicast Shuffle Kijung Shin , Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos , β€œTri -Fly: Completed / Proposed T1.1 / T1.2 / T1.3 42/106 Distributed Estimation of Global and Local Triangle Counts in Graph Streams”, PAKDD 2018

  12. T1.2 Performance of Tri-Fly and DiSLR β€’ πΉπ‘‘π‘’π‘—π‘›π‘π‘’π‘—π‘π‘œ 𝐹𝑠𝑠𝑝𝑠 = 𝐢𝑗𝑏𝑑 + π‘Šπ‘π‘ π‘—π‘π‘œπ‘‘π‘“ 0 30X Tri-Fly 40X 40X DiSLR better better Completed / Proposed T1.1 / T1.2 / T1.3 43/106

  13. T1.3 Estimation of Degeneracy β€’ Goal: to estimate the degeneracy* in a graph stream? β€’ Core-Triangle Pattern β—¦ 3:1 power law between the triangle count and the degeneracy *degeneracy: maximum 𝑙 such that a subgraph where every node has degree at least 𝑙 exists. Kijung Shin , Tina Eliassi-Rad, and Christos Faloutsos , β€œPatterns and Anomalies in kCores Completed / Proposed T1.1 / T1.2 / T1.3 44/106 of Real- world Graphs with Applications”, KAIS 2018 (previously ICDM 2016)

  14. T1.3 Core-D Algorithm β€’ Core-D : one-pass streaming algorithm for degeneracy መ 𝑒 = exp(𝛽 β‹… log(ΰ·‘ βˆ†) + 𝛾) Estimated Estimated Degeneracy Triangle Count (obtained by WRS, etc.) Core-D better Completed / Proposed T1.1 / T1.2 / T1.3 45/106

  15. Structure Analysis of Graphs Models: β—¦ Relaxed graph stream model β—¦ Distributed graph stream model Patterns : β—¦ Temporal locality β—¦ Core-Triangle pattern Algorithms: β—¦ WRS, Tri-Fly, and DiSLR β—¦ Core-D Analyses: bias and variance Completed / Proposed T1.1 / T1.2 / T1.3 46/106

  16. Completed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Triangle Count skip Graphs Anomalous Purchase [ICDM17][PAKDD18] [submitted to KDD] Subgraph Behavior Degeneracy [ICDM16]* [KAIS18]* [IJCAI17] [ICDM16]* [KAIS18]* Progressive Dense Subtensors skip Tensors Summarization Behavior [PKDD16][WSDM17] [WSDM17] [KDD17][TKDD18] [WWW18] * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 47/106

  17. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β—¦ T2. Anomaly Detection β–ͺ T2.1 M-Zoom << β–ͺ T2.2-T2.3 Related Completed Work β—¦ T3. Behavior Modeling β€’ Proposed Work β€’ Conclusion Kijung Shin, Bryan Hooi, and Christos Faloutsos , β€œFast, Accurate and Flexible Algorithms for Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 48/106 Dense Subtensor Mining”, TKDD 2018 (previously ECML/PKDD 2016)

  18. Motivation: Review Fraud Alice’s Bob’s Carol’s Alice Completed / Proposed T2.1 / T2.2 / T2.3 49/106

  19. Fraud Forms Dense Block Accounts Restaurants Accounts Restaurants Adjacency Matrix Completed / Proposed T2.1 / T2.2 / T2.3 50/106

  20. Problem: Natural Dense Subgraphs Accounts natural dense blocks Restaurants (core, community, etc.) suspicious dense blocks formed by fraudsters β€’ Question. How can we distinguish them? Adjacency Matrix Completed / Proposed T2.1 / T2.2 / T2.3 51/106

  21. Solution: Tensor Modeling Accounts β€’ Along the time axis… β—¦ Natural dense blocks are sparse (formed gradually) β—¦ Suspicious dense blocks are dense (synchronized behavior) Restaurants β€’ In the tensor model β—¦ Suspicious dense blocks become denser than natural dense blocks Completed / Proposed T2.1 / T2.2 / T2.3 52/106

  22. Solution: Tensor Modeling (cont.) β€’ High-order tensor modeling: β—¦ any side information can be used additionally IP Address Keywords Number of stars β€œGiven a large -scale high-order tensor, how can we find dense blocks in it?” Completed / Proposed T2.1 / T2.2 / T2.3 53/106

  23. Problem Definition β€’ Given : (1) 𝑺 : an 𝑂 -order tensor, (2) 𝝇 : a density measure, (3) 𝒍 : the number of blocks we aim to find β€’ Find : 𝒍 distinct dense blocks maximizing 𝝇 𝑺 = 𝒍 = πŸ’ { } , , Completed / Proposed T2.1 / T2.2 / T2.3 54/106

  24. Density Measures β€’ How should we define β€œdensity” (i.e., 𝜍 )? β—¦ no one absolute answer β—¦ depends on data, types of anomalies, etc. β€’ Goal: flexible algorithm working well with various reasonable measures β—¦ Arithmetic avg. degree ρ 𝐡 β—¦ Geometric avg. degree ρ 𝐻 β—¦ Suspiciousness (KL Divergence) ρ 𝑇 β—¦ Traditional Density: ρ π‘ˆ 𝐢 = EntrySum 𝐢 /Vol(B) - maximized by a single entry with the maximum value Completed / Proposed T2.1 / T2.2 / T2.3 55/106

  25. Clarification of Blocks (Subtensors) β€’ The concept of blocks (subtensors) is independent of the orders of rows and columns β€’ Entries in a block do not need to be adjacent Accounts Accounts Restaurants Restaurants Completed / Proposed T2.1 / T2.2 / T2.3 56/106

  26. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β—¦ T2. Anomaly Detection β–ͺ T2.1 M-Zoom [PKDD 16] β—¦ Algorithm << β—¦ Experiments β–ͺ T2.2-T2.3 Related Completed Work β—¦ T3. Behavior Modeling β€’ Proposed Work β€’ Conclusion Completed / Proposed T2.1 / T2.2 / T2.3 57/106

  27. Single Dense Block Detection β€’ Greedy search β€’ Starts from the entire tensor 1 0 1 5 3 0 0 4 6 1 0 𝜍 = 2.9 2 0 0 Completed / Proposed T2.1 / T2.2 / T2.3 58/106

  28. Single Dense Block Detection (cont.) β€’ Remove a slice to maximize density 𝜍 5 3 0 4 6 1 𝜍 = 3 2 0 0 Completed / Proposed T2.1 / T2.2 / T2.3 59/106

  29. Single Dense Block Detection (cont.) β€’ Remove a slice to maximize density 𝜍 5 3 4 6 𝜍 = 3.3 2 0 Completed / Proposed T2.1 / T2.2 / T2.3 60/106

  30. Single Dense Block Detection (cont.) β€’ Remove a slice to maximize density 𝜍 5 3 4 6 𝜍 = 3.6 2 0 Completed / Proposed T2.1 / T2.2 / T2.3 61/106

  31. Single Dense Block Detection (cont.) β€’ Until all slices are removed 4 3 Density 2 1 0 0 2 4 6 8 Iteration 𝜍 = 0 Completed / Proposed T2.1 / T2.2 / T2.3 62/106

  32. Single Dense Block Detection (cont.) β€’ Output: return the densest block so far 5 3 4 6 𝜍 = 3.6 2 0 Completed / Proposed T2.1 / T2.2 / T2.3 63/106

  33. Speeding Up Process β€’ Lemma 1 [Remove Minimum Sum First] Among slices in the same dimension, removing the slice with smallest sum of entries increases 𝜍 most 12 > 9 > 2 Completed / Proposed T2.1 / T2.2 / T2.3 64/106

  34. Accuracy Guarantee β€’ Theorem 1 [Approximation Guarantee] 𝝇 𝑩 π‘ͺ β‰₯ 𝟐 𝑢 𝝇 𝑩 π‘ͺ βˆ— M-Zoom Result Densest Block Order β€’ Theorem 2 [Near-linear Time Complexity] 𝑷(𝑢𝑡 log 𝑴) Order # Non-zeros # Entries in each mode Completed / Proposed T2.1 / T2.2 / T2.3 65/106

  35. Optional Post Process β€’ Local search β—¦ grow or shrink until a local maximum is reached 𝝇 = πŸ‘ 𝝇 = πŸ’. πŸ‘πŸ˜ grow shrink result of our previous greedy search 𝝇 = 𝟐. πŸ— Completed / Proposed T2.1 / T2.2 / T2.3 66/106

  36. Optional Post Process (cont.) β€’ Local search β—¦ grow or shrink until a local maximum is reached 𝝇 = πŸ’. πŸ‘πŸ˜ 𝝇 = πŸ’. πŸ’πŸ’ shrink grow 𝝇 = πŸ’. πŸ‘πŸ” Completed / Proposed T2.1 / T2.2 / T2.3 67/106

  37. Optional Post Process (cont.) β€’ Local search β—¦ grow or shrink until a local maximum is reached 𝝇 = πŸ’. πŸ‘πŸ˜ 𝝇 = πŸ’. πŸ’πŸ’ grow shrink 𝝇 = πŸ’. πŸ— Completed / Proposed T2.1 / T2.2 / T2.3 68/106

  38. Optional Post Process (cont.) β€’ Local search β—¦ grow or shrink until a local maximum is reached β€’ Return the local maximum 𝝇 = πŸ’. πŸ’πŸ’ grow Local shrink maximum 𝝇 = πŸ’ 𝝇 = πŸ’. πŸ— Completed / Proposed T2.1 / T2.2 / T2.3 69/106

  39. Multiple Block Detection β€’ Deflation : Remove found blocks before finding others Remove Remove Find Find Find Restore Completed / Proposed T2.1 / T2.2 / T2.3 70/106

  40. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β—¦ T2. Anomaly Detection β–ͺ T2.1 M-Zoom [PKDD 16] β—¦ Algorithm β—¦ Experiments << β–ͺ T2.2-T2.3 Related Completed Work β—¦ T3. Behavior Modeling β€’ Proposed Work β€’ Conclusion Completed / Proposed T2.1 / T2.2 / T2.3 71/106

  41. Speed & Accuracy β€’ Datasets : …. 3X 2X 2X Density metric: 𝜍 𝑇 Density metric: 𝜍 𝐡 Density metric: 𝜍 𝐻 Completed / Proposed T2.1 / T2.2 / T2.3 72/106

  42. Discoveries in Practice Korean 11 accounts Wikipedia revised 10 pages Pages 2,305 times within 16 hours Accounts English 8 accounts Wikipedia revised 12 pages Pages 2.5 million times 100% Accounts Completed / Proposed T2.1 / T2.2 / T2.3 73/106

  43. Discoveries in Practice (cont.) App Market 9 accounts (4-order) gives 1 product 369 reviews with the same rating Accounts within 22 hours 100% TCP Dump a block whose (7-order) volume = 2 and Protocols mass = 2 millions 100% Completed / Proposed T2.1 / T2.2 / T2.3 74/106

  44. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β—¦ T2. Anomaly Detection β–ͺ M-Zoom β–ͺ T2.2-T2.3 Related Completed Work << β—¦ T3. Behavior Modeling β€’ Proposed Work β€’ Conclusion Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 75/106

  45. T2.2 Extension to Web-scale Tensors β€’ Goal: to find dense blocks in a disk-resident or distributed tensor β€’ D-Cube : gives the same accuracy guarantee of M-Zoom with much less iterations Entry sum in slices 100 B nonzeros in 5 hours Average Kijung Shin , Bryan Hooi, Jisu Kim, and Christos Faloutsos, Completed / Proposed T2.1 / T2.2 / T2.3 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 76/106 76/106 β€œD -Cube: Dense-Block Detection in Terabyte- Scale Tensors”, WSDM 2017

  46. T2.3 Extension to Dynamic Tensors β€’ Goal: to maintain a dense block in a dynamic tensor that changes over time β€’ DenseStream : incrementally computes a dense block with the same accuracy guarantee of M-Zoom Kijung Shin , Bryan Hooi, Jisu Kim, and Christos Faloutsos, Completed / Proposed Completed / Proposed T2.1 / T2.2 / T2.3 T2.1 / T2.2 / T2.3 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 77/106 77/106 77/106 β€œ DenseAlert: Incremental Dense- Subtensor Detection in Tensor Streams”, KDD 2017

  47. Anomaly Detection in Tensors β€’ Algorithms: β—¦ M-Zoom, D-Cube, and DenseStream β€’ Analyses: approximation guarantees β€’ Discoveries: β—¦ Edit war, vandalism, and bot activities β—¦ Network intrusion β—¦ Spam reviews Completed / Proposed T2.1 / T2.2 / T2.3 78/106

  48. Completed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Triangle Count skip skip Graphs Anomalous Purchase [ICDM17][PAKDD18] [submitted to KDD] Subgraph Behavior Degeneracy [ICDM16]* [KAIS18]* [IJCAI17] [ICDM16]* [KAIS18]* Progressive Dense Subtensors skip Tensors Summarization Behavior [PKDD16][WSDM17] [WSDM17] [KDD17][TKDD18] [WWW18] * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 79/106

  49. Motivation profile profile Start Goal Welcome to ? ? ? … profile Kijung Shin , Mahdi Shafiei, Myunghwan Kim, Aastha Jain, and Hema Raghavan, Completed / Proposed Completed / Proposed Completed / Proposed T2.1 / T2.2 / T2.3 T3.1 T2.1 / T2.2 / T2.3 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 80/106 80/106 80/106 80/106 β€œDiscovering Progression Stages in Trillion - Scale Behavior Logs”, WWW 2018

  50. Problem Definition β€’ Given : β—¦ behavior log β—¦ number of desired latent stages: 𝑙 β€’ Find : 𝑙 progression stages Action types β—¦ types of actions β—¦ frequency of actions β—¦ transitions to other stages Users β€’ To best describe the given behavior log Completed / Proposed T3.1 81/106

  51. Behavior Model β€’ Generative process: β—¦ Θ 𝑑 : action-type distribution in stage 𝑑 β—¦ 𝜚 𝑑 : time-gap distribution in stage 𝑑 β—¦ πœ” 𝑑 : next-stage distribution in stage 𝑑 πœ” 3 πœ” 2 πœ” 2 πœ” 1 πœ” 0 Welcome to 1 2 2 3 Θ 2 𝜚 2 Θ 3 𝜚 3 Θ 1 𝜚 1 Θ 2 𝜚 2 jobs connect message connect β€’ Constraint: β€œno decline” (progression but no cyclic patterns) 2 3 1 2 3 1 Completed / Proposed T3.1 82/106

  52. Optimization Algorithm β€’ Goal: to fit our model to given data β—¦ parameters: distributions (i.e., Θ 𝑑 , 𝜚 𝑑 , πœ” 𝑑 𝑑 ) and latent stages β€’ repeat until convergence β—¦ assignment step : assign latent stages while fixing prob. distributions 3 β€œno decline” 2 β†’ Dynamic Programming 1 β—¦ update step : update prob. distributions while fixing latent stages β–ͺ e.g., Θ 𝑑 ← ratio of the types of actions in stage 𝑑 Completed / Proposed T3.1 83/106

  53. Scalability & Convergence β€’ Three versions of our algorithm β—¦ In-memory β—¦ Out-of-core (or external-memory) β—¦ Distributed 20 15 10 1 trillion 5 latent actions stages in 2 hours Completed / Proposed T3.1 84/106

  54. Progression of Users in LinkedIn Build one’s Onboarding Join Profile Process Grow one’s Poke around Consume Have 30 Social the service Newsfeeds connections Network Completed / Proposed T3.1 85/106

  55. Completed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Triangle Count skip skip Graphs Anomalous Purchase [ICDM17][PAKDD18] [submitted to KDD] Subgraph Behavior Degeneracy [ICDM16]* [KAIS18]* [IJCAI17] [ICDM16]* [KAIS18]* Progressive Dense Subtensors skip Tensors Summarization Behavior [PKDD16][WSDM17] [WSDM17] [KDD17][TKDD18] [WWW18] * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/106

  56. Roadmap β€’ Overview β€’ Completed Work β—¦ T1. Structure Analysis β—¦ T2. Anomaly Detection β—¦ T3. Behavior Modeling β€’ Proposed Work << β€’ Conclusion Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 87/106

  57. Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 88/106

  58. Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 89/106

  59. P1: Problem Definition β€’ Given: β—¦ a fully dynamic graph stream, β–ͺ i.e., list of edge insertions and edge deletions … , , + , , βˆ’ , , + , , βˆ’ , … β—¦ Memory budget 𝑙 β€’ Estimate: the counts of global and local triangles β€’ To Minimize: estimation error Completed / Proposed P1 / P2 / P3 90/106

  60. P1: Goal Handle Method Accuracy Deletions? Triest-FD Lowest Yes MASCOT Low No Triest-IMPR High No WRS Highest No Proposed Highest Yes Completed / Proposed P1 / P2 / P3 91/106

  61. Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 92/106

  62. P2: Problem Definition β€’ Tucker Decomposition (a.k.a High-order PCA) β—¦ Given : an 𝑂 -order input tensor 𝒀 β—¦ Find : 𝑂 factor matrices 𝐡 (1) … 𝐡 (𝑂) & core-tensor 𝒁 β—¦ To satisfy : 𝐡 (3) 𝒀 [input] β‰ˆ 𝒁 𝐡 (2) 𝐡 (1) Completed / Proposed P1 / P2 / P3 93/106

  63. P2: Standard Algorithms Input Intermediate Data Output (large & sparse) (large & dense) (small & dense) Materialized SVD 400GB - 4TB 2GB 2GB Scalability bottleneck Completed / Proposed P1 / P2 / P3 94/106

  64. P2: Completed Work β€’ Our completed work [WSDM17] Input Intermediate Data Output (large & sparse) (large & dense) (small & dense) On-the-fly SVD Incurs repeated computation Jinoh Oh, Kijung Shin , Evangelos E. Papalexakis, Christos Faloutsos, and Hwanjo Yu, Completed / Proposed P1 / P2 / P3 95/106 β€œS -HOT: Scalable High- Order Tucker Decomposition”, WSDM 2017.

  65. P2: Proposed Work β€’ Proposed algorithm Input Intermediate Data Output (large & sparse) (small & dense) (small & dense) Materialized On-the-fly Partially materialize intermediate data! Completed / Proposed P1 / P2 / P3 96/106

  66. P2: Expected Performance Gain β€’ Which part of intermediate data should we materialize? β€’ Exploit skewed degree distributions! % of Saved Computation % of Materialized Data Completed / Proposed P1 / P2 / P3 97/106

  67. Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 98/106

  68. P3. Polarization Modeling β€’ Polarization in social networks: division into contrasting groups Use of marijuana should be: Legal Illegal OR change change of beliefs of edges β€œHow do people choose between two ways of polarization ?” Completed / Proposed P1 / P2 / P3 99/106

  69. P3. Problem Definition β€’ Given : time- evolving social network with nodes’ beliefs on controversial issues β—¦ e.g., legalizing marijuana β€’ Find : actor-based model with a utility function β—¦ depending on network features, beliefs, etc. β€’ To best describe: the polarization in data β€’ Applications : β—¦ predict future edges β—¦ predict the cascades of beliefs Completed / Proposed P1 / P2 / P3 100/106

Recommend


More recommend