Goal of Sampling Step β’ to maximize discovering probability π π£π€π₯ Theorem. Variance of our estimate: Var β β Ο (π£,π€,π₯) (1/π π£π€π₯ β 1) True Count Theorem. Unbiasedness of our estimate : Bias[β] = Exp β β True count = 0 πΉπ‘π’ππππ’πππ πΉπ π ππ = πΆπππ‘ + πππ πππππ 0 Completed / Proposed T1.1 / T1.2 / T1.3 32/106
Increasing Discovering Prob. βHow can we increase discovering probabilities of triangles?β β’ Recall Temporal Locality: β¦ new edges are more likely to form β¦ triangles with recent edges β¦ than with old edges β’ Waiting-Room Sampling (WRS) β¦ treats recent edges better than old edges β¦ to exploit temporal locality Completed / Proposed T1.1 / T1.2 / T1.3 33/106
Waiting-Room Sampling (WRS) β’ Divides memory space into two parts β¦ Waiting Room: latest edges are always stored β¦ Reservoir: the remaining edges are sampled π 80 New edge Waiting Room (FIFO) Reservoir (Random Replace) π 79 π 78 π 77 π 76 π 61 π 7 π 18 π 25 π 40 π 1 π 28 π½ % of budget 100 β π½ % of budget Completed / Proposed T1.1 / T1.2 / T1.3 34/106
WRS: Sampling Steps (Step 1) π ππ New edge Waiting Room (FIFO) Reservoir (Random Replace) π 79 π 78 π 77 π ππ π 61 π 7 π 18 π 25 π 40 π 1 π 28 π ππ Popped edge Waiting Room (FIFO) Reservoir (Random Replace) π ππ π 79 π 78 π 77 π 61 π 7 π 18 π 25 π 40 π 1 π 28 Completed / Proposed T1.1 / T1.2 / T1.3 35/106
WRS: Sampling Steps (Step 2) π ππ Popped edge Waiting Room (FIFO) Reservoir (Random Replace) π 80 π 79 π 78 π 77 π 61 π 7 π 18 π 25 π 40 π 1 π 28 replace! π 61 π 7 π 18 π 25 π ππ π 1 π 28 store or or π 61 π 7 π 18 π 25 π 40 π 1 π 28 discard Completed / Proposed T1.1 / T1.2 / T1.3 36/106
Summary of Algorithm Waiting-Room Sampling! (1) Arrival Step (2) Discovery Step (3) Sampling Step discover! π£ β π€ π£ β π€ new edge π£ β π€ π¦ π£ π€ π£ π£ π€ π€ π£ π£ π€ π€ π£ π€ | | | | | | | | | | | | π¦ π¦ π¦ π¦ π¦ π§ π¦ π§ π§ π§ π€ π§ memory ββ β + 1/π π£π€π¦ Completed / Proposed T1.1 / T1.2 / T1.3 37/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis βͺ T1.1 Waiting-Room Sampling β¦ Temporal Pattern β¦ Algorithm β¦ Experiments << βͺ T1.2-T1.3 Related Completed Work β¦ T2. Anomaly Detection β¦ T3. Behavior Modeling β’ Proposed Work β’ Conclusion Completed / Proposed T1.1 / T1.2 / T1.3 38/106
Experimental Results: Accuracy β’ Datasets: β’ WRS is most accurate (reduces error up to ππ% ) Completed / Proposed T1.1 / T1.2 / T1.3 39/106
Discovering Probability β’ WRS increases discovering probability π π£π€π₯ β’ WRS discovers up to 3 Γ more triangles better WRS Triest-IMPR MASCOT Completed / Proposed T1.1 / T1.2 / T1.3 40/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis βͺ T1.1 Waiting-Room Sampling βͺ T1.2-T1.3 Related Completed Work << β¦ T2. Anomaly Detection β¦ T3. Behavior Modeling β’ Proposed Work β’ Conclusion Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 41/106
T1.2 Distributed Counting of Triangles β’ Goal: to utilize multiple machines for triangle counting in a graph stream? DiSLR [submitted to KDD] Tri-Fly [PAKDD18] Sources Workers Aggregators Sources Workers Aggregators Broadcast Shuffle Multicast Shuffle Kijung Shin , Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos , βTri -Fly: Completed / Proposed T1.1 / T1.2 / T1.3 42/106 Distributed Estimation of Global and Local Triangle Counts in Graph Streamsβ, PAKDD 2018
T1.2 Performance of Tri-Fly and DiSLR β’ πΉπ‘π’ππππ’πππ πΉπ π ππ = πΆπππ‘ + πππ πππππ 0 30X Tri-Fly 40X 40X DiSLR better better Completed / Proposed T1.1 / T1.2 / T1.3 43/106
T1.3 Estimation of Degeneracy β’ Goal: to estimate the degeneracy* in a graph stream? β’ Core-Triangle Pattern β¦ 3:1 power law between the triangle count and the degeneracy *degeneracy: maximum π such that a subgraph where every node has degree at least π exists. Kijung Shin , Tina Eliassi-Rad, and Christos Faloutsos , βPatterns and Anomalies in kCores Completed / Proposed T1.1 / T1.2 / T1.3 44/106 of Real- world Graphs with Applicationsβ, KAIS 2018 (previously ICDM 2016)
T1.3 Core-D Algorithm β’ Core-D : one-pass streaming algorithm for degeneracy α π = exp(π½ β log(ΰ·‘ β) + πΎ) Estimated Estimated Degeneracy Triangle Count (obtained by WRS, etc.) Core-D better Completed / Proposed T1.1 / T1.2 / T1.3 45/106
Structure Analysis of Graphs Models: β¦ Relaxed graph stream model β¦ Distributed graph stream model Patterns : β¦ Temporal locality β¦ Core-Triangle pattern Algorithms: β¦ WRS, Tri-Fly, and DiSLR β¦ Core-D Analyses: bias and variance Completed / Proposed T1.1 / T1.2 / T1.3 46/106
Completed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Triangle Count skip Graphs Anomalous Purchase [ICDM17][PAKDD18] [submitted to KDD] Subgraph Behavior Degeneracy [ICDM16]* [KAIS18]* [IJCAI17] [ICDM16]* [KAIS18]* Progressive Dense Subtensors skip Tensors Summarization Behavior [PKDD16][WSDM17] [WSDM17] [KDD17][TKDD18] [WWW18] * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 47/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis β¦ T2. Anomaly Detection βͺ T2.1 M-Zoom << βͺ T2.2-T2.3 Related Completed Work β¦ T3. Behavior Modeling β’ Proposed Work β’ Conclusion Kijung Shin, Bryan Hooi, and Christos Faloutsos , βFast, Accurate and Flexible Algorithms for Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 48/106 Dense Subtensor Miningβ, TKDD 2018 (previously ECML/PKDD 2016)
Motivation: Review Fraud Aliceβs Bobβs Carolβs Alice Completed / Proposed T2.1 / T2.2 / T2.3 49/106
Fraud Forms Dense Block Accounts Restaurants Accounts Restaurants Adjacency Matrix Completed / Proposed T2.1 / T2.2 / T2.3 50/106
Problem: Natural Dense Subgraphs Accounts natural dense blocks Restaurants (core, community, etc.) suspicious dense blocks formed by fraudsters β’ Question. How can we distinguish them? Adjacency Matrix Completed / Proposed T2.1 / T2.2 / T2.3 51/106
Solution: Tensor Modeling Accounts ⒠Along the time axis⦠⦠Natural dense blocks are sparse (formed gradually) ⦠Suspicious dense blocks are dense (synchronized behavior) Restaurants ⒠In the tensor model ⦠Suspicious dense blocks become denser than natural dense blocks Completed / Proposed T2.1 / T2.2 / T2.3 52/106
Solution: Tensor Modeling (cont.) β’ High-order tensor modeling: β¦ any side information can be used additionally IP Address Keywords Number of stars βGiven a large -scale high-order tensor, how can we find dense blocks in it?β Completed / Proposed T2.1 / T2.2 / T2.3 53/106
Problem Definition β’ Given : (1) πΊ : an π -order tensor, (2) π : a density measure, (3) π : the number of blocks we aim to find β’ Find : π distinct dense blocks maximizing π πΊ = π = π { } , , Completed / Proposed T2.1 / T2.2 / T2.3 54/106
Density Measures β’ How should we define βdensityβ (i.e., π )? β¦ no one absolute answer β¦ depends on data, types of anomalies, etc. β’ Goal: flexible algorithm working well with various reasonable measures β¦ Arithmetic avg. degree Ο π΅ β¦ Geometric avg. degree Ο π» β¦ Suspiciousness (KL Divergence) Ο π β¦ Traditional Density: Ο π πΆ = EntrySum πΆ /Vol(B) - maximized by a single entry with the maximum value Completed / Proposed T2.1 / T2.2 / T2.3 55/106
Clarification of Blocks (Subtensors) β’ The concept of blocks (subtensors) is independent of the orders of rows and columns β’ Entries in a block do not need to be adjacent Accounts Accounts Restaurants Restaurants Completed / Proposed T2.1 / T2.2 / T2.3 56/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis β¦ T2. Anomaly Detection βͺ T2.1 M-Zoom [PKDD 16] β¦ Algorithm << β¦ Experiments βͺ T2.2-T2.3 Related Completed Work β¦ T3. Behavior Modeling β’ Proposed Work β’ Conclusion Completed / Proposed T2.1 / T2.2 / T2.3 57/106
Single Dense Block Detection β’ Greedy search β’ Starts from the entire tensor 1 0 1 5 3 0 0 4 6 1 0 π = 2.9 2 0 0 Completed / Proposed T2.1 / T2.2 / T2.3 58/106
Single Dense Block Detection (cont.) β’ Remove a slice to maximize density π 5 3 0 4 6 1 π = 3 2 0 0 Completed / Proposed T2.1 / T2.2 / T2.3 59/106
Single Dense Block Detection (cont.) β’ Remove a slice to maximize density π 5 3 4 6 π = 3.3 2 0 Completed / Proposed T2.1 / T2.2 / T2.3 60/106
Single Dense Block Detection (cont.) β’ Remove a slice to maximize density π 5 3 4 6 π = 3.6 2 0 Completed / Proposed T2.1 / T2.2 / T2.3 61/106
Single Dense Block Detection (cont.) β’ Until all slices are removed 4 3 Density 2 1 0 0 2 4 6 8 Iteration π = 0 Completed / Proposed T2.1 / T2.2 / T2.3 62/106
Single Dense Block Detection (cont.) β’ Output: return the densest block so far 5 3 4 6 π = 3.6 2 0 Completed / Proposed T2.1 / T2.2 / T2.3 63/106
Speeding Up Process β’ Lemma 1 [Remove Minimum Sum First] Among slices in the same dimension, removing the slice with smallest sum of entries increases π most 12 > 9 > 2 Completed / Proposed T2.1 / T2.2 / T2.3 64/106
Accuracy Guarantee β’ Theorem 1 [Approximation Guarantee] π π© πͺ β₯ π πΆ π π© πͺ β M-Zoom Result Densest Block Order β’ Theorem 2 [Near-linear Time Complexity] π·(πΆπ΅ log π΄) Order # Non-zeros # Entries in each mode Completed / Proposed T2.1 / T2.2 / T2.3 65/106
Optional Post Process β’ Local search β¦ grow or shrink until a local maximum is reached π = π π = π. ππ grow shrink result of our previous greedy search π = π. π Completed / Proposed T2.1 / T2.2 / T2.3 66/106
Optional Post Process (cont.) β’ Local search β¦ grow or shrink until a local maximum is reached π = π. ππ π = π. ππ shrink grow π = π. ππ Completed / Proposed T2.1 / T2.2 / T2.3 67/106
Optional Post Process (cont.) β’ Local search β¦ grow or shrink until a local maximum is reached π = π. ππ π = π. ππ grow shrink π = π. π Completed / Proposed T2.1 / T2.2 / T2.3 68/106
Optional Post Process (cont.) β’ Local search β¦ grow or shrink until a local maximum is reached β’ Return the local maximum π = π. ππ grow Local shrink maximum π = π π = π. π Completed / Proposed T2.1 / T2.2 / T2.3 69/106
Multiple Block Detection β’ Deflation : Remove found blocks before finding others Remove Remove Find Find Find Restore Completed / Proposed T2.1 / T2.2 / T2.3 70/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis β¦ T2. Anomaly Detection βͺ T2.1 M-Zoom [PKDD 16] β¦ Algorithm β¦ Experiments << βͺ T2.2-T2.3 Related Completed Work β¦ T3. Behavior Modeling β’ Proposed Work β’ Conclusion Completed / Proposed T2.1 / T2.2 / T2.3 71/106
Speed & Accuracy β’ Datasets : β¦. 3X 2X 2X Density metric: π π Density metric: π π΅ Density metric: π π» Completed / Proposed T2.1 / T2.2 / T2.3 72/106
Discoveries in Practice Korean 11 accounts Wikipedia revised 10 pages Pages 2,305 times within 16 hours Accounts English 8 accounts Wikipedia revised 12 pages Pages 2.5 million times 100% Accounts Completed / Proposed T2.1 / T2.2 / T2.3 73/106
Discoveries in Practice (cont.) App Market 9 accounts (4-order) gives 1 product 369 reviews with the same rating Accounts within 22 hours 100% TCP Dump a block whose (7-order) volume = 2 and Protocols mass = 2 millions 100% Completed / Proposed T2.1 / T2.2 / T2.3 74/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis β¦ T2. Anomaly Detection βͺ M-Zoom βͺ T2.2-T2.3 Related Completed Work << β¦ T3. Behavior Modeling β’ Proposed Work β’ Conclusion Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 75/106
T2.2 Extension to Web-scale Tensors β’ Goal: to find dense blocks in a disk-resident or distributed tensor β’ D-Cube : gives the same accuracy guarantee of M-Zoom with much less iterations Entry sum in slices 100 B nonzeros in 5 hours Average Kijung Shin , Bryan Hooi, Jisu Kim, and Christos Faloutsos, Completed / Proposed T2.1 / T2.2 / T2.3 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 76/106 76/106 βD -Cube: Dense-Block Detection in Terabyte- Scale Tensorsβ, WSDM 2017
T2.3 Extension to Dynamic Tensors β’ Goal: to maintain a dense block in a dynamic tensor that changes over time β’ DenseStream : incrementally computes a dense block with the same accuracy guarantee of M-Zoom Kijung Shin , Bryan Hooi, Jisu Kim, and Christos Faloutsos, Completed / Proposed Completed / Proposed T2.1 / T2.2 / T2.3 T2.1 / T2.2 / T2.3 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 77/106 77/106 77/106 β DenseAlert: Incremental Dense- Subtensor Detection in Tensor Streamsβ, KDD 2017
Anomaly Detection in Tensors β’ Algorithms: β¦ M-Zoom, D-Cube, and DenseStream β’ Analyses: approximation guarantees β’ Discoveries: β¦ Edit war, vandalism, and bot activities β¦ Network intrusion β¦ Spam reviews Completed / Proposed T2.1 / T2.2 / T2.3 78/106
Completed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Triangle Count skip skip Graphs Anomalous Purchase [ICDM17][PAKDD18] [submitted to KDD] Subgraph Behavior Degeneracy [ICDM16]* [KAIS18]* [IJCAI17] [ICDM16]* [KAIS18]* Progressive Dense Subtensors skip Tensors Summarization Behavior [PKDD16][WSDM17] [WSDM17] [KDD17][TKDD18] [WWW18] * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 79/106
Motivation profile profile Start Goal Welcome to ? ? ? β¦ profile Kijung Shin , Mahdi Shafiei, Myunghwan Kim, Aastha Jain, and Hema Raghavan, Completed / Proposed Completed / Proposed Completed / Proposed T2.1 / T2.2 / T2.3 T3.1 T2.1 / T2.2 / T2.3 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 80/106 80/106 80/106 80/106 βDiscovering Progression Stages in Trillion - Scale Behavior Logsβ, WWW 2018
Problem Definition β’ Given : β¦ behavior log β¦ number of desired latent stages: π β’ Find : π progression stages Action types β¦ types of actions β¦ frequency of actions β¦ transitions to other stages Users β’ To best describe the given behavior log Completed / Proposed T3.1 81/106
Behavior Model β’ Generative process: β¦ Ξ π‘ : action-type distribution in stage π‘ β¦ π π‘ : time-gap distribution in stage π‘ β¦ π π‘ : next-stage distribution in stage π‘ π 3 π 2 π 2 π 1 π 0 Welcome to 1 2 2 3 Ξ 2 π 2 Ξ 3 π 3 Ξ 1 π 1 Ξ 2 π 2 jobs connect message connect β’ Constraint: βno declineβ (progression but no cyclic patterns) 2 3 1 2 3 1 Completed / Proposed T3.1 82/106
Optimization Algorithm β’ Goal: to fit our model to given data β¦ parameters: distributions (i.e., Ξ π‘ , π π‘ , π π‘ π‘ ) and latent stages β’ repeat until convergence β¦ assignment step : assign latent stages while fixing prob. distributions 3 βno declineβ 2 β Dynamic Programming 1 β¦ update step : update prob. distributions while fixing latent stages βͺ e.g., Ξ π‘ β ratio of the types of actions in stage π‘ Completed / Proposed T3.1 83/106
Scalability & Convergence β’ Three versions of our algorithm β¦ In-memory β¦ Out-of-core (or external-memory) β¦ Distributed 20 15 10 1 trillion 5 latent actions stages in 2 hours Completed / Proposed T3.1 84/106
Progression of Users in LinkedIn Build oneβs Onboarding Join Profile Process Grow oneβs Poke around Consume Have 30 Social the service Newsfeeds connections Network Completed / Proposed T3.1 85/106
Completed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Triangle Count skip skip Graphs Anomalous Purchase [ICDM17][PAKDD18] [submitted to KDD] Subgraph Behavior Degeneracy [ICDM16]* [KAIS18]* [IJCAI17] [ICDM16]* [KAIS18]* Progressive Dense Subtensors skip Tensors Summarization Behavior [PKDD16][WSDM17] [WSDM17] [KDD17][TKDD18] [WWW18] * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/106
Roadmap β’ Overview β’ Completed Work β¦ T1. Structure Analysis β¦ T2. Anomaly Detection β¦ T3. Behavior Modeling β’ Proposed Work << β’ Conclusion Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 87/106
Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 88/106
Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 89/106
P1: Problem Definition β’ Given: β¦ a fully dynamic graph stream, βͺ i.e., list of edge insertions and edge deletions β¦ , , + , , β , , + , , β , β¦ β¦ Memory budget π β’ Estimate: the counts of global and local triangles β’ To Minimize: estimation error Completed / Proposed P1 / P2 / P3 90/106
P1: Goal Handle Method Accuracy Deletions? Triest-FD Lowest Yes MASCOT Low No Triest-IMPR High No WRS Highest No Proposed Highest Yes Completed / Proposed P1 / P2 / P3 91/106
Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 92/106
P2: Problem Definition β’ Tucker Decomposition (a.k.a High-order PCA) β¦ Given : an π -order input tensor π β¦ Find : π factor matrices π΅ (1) β¦ π΅ (π) & core-tensor π β¦ To satisfy : π΅ (3) π [input] β π π΅ (2) π΅ (1) Completed / Proposed P1 / P2 / P3 93/106
P2: Standard Algorithms Input Intermediate Data Output (large & sparse) (large & dense) (small & dense) Materialized SVD 400GB - 4TB 2GB 2GB Scalability bottleneck Completed / Proposed P1 / P2 / P3 94/106
P2: Completed Work β’ Our completed work [WSDM17] Input Intermediate Data Output (large & sparse) (large & dense) (small & dense) On-the-fly SVD Incurs repeated computation Jinoh Oh, Kijung Shin , Evangelos E. Papalexakis, Christos Faloutsos, and Hwanjo Yu, Completed / Proposed P1 / P2 / P3 95/106 βS -HOT: Scalable High- Order Tucker Decompositionβ, WSDM 2017.
P2: Proposed Work β’ Proposed algorithm Input Intermediate Data Output (large & sparse) (small & dense) (small & dense) Materialized On-the-fly Partially materialize intermediate data! Completed / Proposed P1 / P2 / P3 96/106
P2: Expected Performance Gain β’ Which part of intermediate data should we materialize? β’ Exploit skewed degree distributions! % of Saved Computation % of Materialized Data Completed / Proposed P1 / P2 / P3 97/106
Proposed Work by Topics T1. Structure T2. Anomaly T3. Behavior Analysis Detection Modeling Graphs P1. Triangle P3. Counting in Fully Polarization Dynamic Stream Modeling P2. Fast and Tensors Scalable Tucker Decomposition * Duplicated Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 98/106
P3. Polarization Modeling β’ Polarization in social networks: division into contrasting groups Use of marijuana should be: Legal Illegal OR change change of beliefs of edges βHow do people choose between two ways of polarization ?β Completed / Proposed P1 / P2 / P3 99/106
P3. Problem Definition β’ Given : time- evolving social network with nodesβ beliefs on controversial issues β¦ e.g., legalizing marijuana β’ Find : actor-based model with a utility function β¦ depending on network features, beliefs, etc. β’ To best describe: the polarization in data β’ Applications : β¦ predict future edges β¦ predict the cascades of beliefs Completed / Proposed P1 / P2 / P3 100/106
Recommend
More recommend