fast and accurate mining of
play

Fast and Accurate Mining of Evolving & Trajectory Networks - PowerPoint PPT Presentation

Fast and Accurate Mining of Evolving & Trajectory Networks Manos Papagelis York University, Toronto, Canada Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D.


  1. Fast and Accurate Mining of Evolving & Trajectory Networks Manos Papagelis York University, Toronto, Canada

  2. Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D. Social Media Mining & Analysis E. City Science / Urban Informatics / IoT F. Natural Language Processing

  3. EvoNRL: Evolving Network Representation Learning Based on Random Walks Joint work with Farzaneh Heidari

  4. networks (universal language for describing complex data)

  5. Classical ML Tasks in Networks ? ? ? ? ? community detection link prediction node classification ? triangle count graph similarity anomaly detection Limitations of Classical ML: • expensive computation (high dimension computations) • extensive domain knowledge (task specific)

  6. Network Representation Learning (NRL) Network Low-dimension space several network structural properties can be learned/embedded ( nodes, edges, subgraphs, graphs, … ) Premise of NRL: • faster computations (low dimension computations) • agnostic domain knowledge (task independent)

  7. Random Walk-based NRL 3 5 3 5 8 8 6 6 4 4 1 1 1 1 9 9 7 1 3 5 8 7 6 4 5 7 2 2 1 3 5 8 7 6 5 2 Obtain a set of . . Input network random walks . . . . . . 8 5 4 3 5 6 7 87 2 88 4 5 6 7 8 9 4 2 1 3 5 6 7 8 89 1 90 7 4 2 1 3 5 6 3 5 Treat the set of random walks as sentences 6 8 7 9 StaticNRL Learn a vector representation Feed sentences to a for each node Skip-gram NN model (DeepWalk , node2vec, …)

  8. but real-world networks are constantly evolving

  9. Evolving Network Representations Learning

  10. Naive Approach t = 2 t = 1 t = 0 3 5 3 5 3 5 8 8 8 6 6 6 4 4 4 1 1 1 1 1 1 9 9 9 7 7 7 2 2 2 StaticNRL StaticNRL StaticNRL 2 2 1 1 4 3 3 5 4 5 4 6 6 8 6 2 8 8 1 7 7 7 9 9 9 3 5 Impractical (expensive, incomparable representations)

  11. EvoNRL Key Idea dynamically maintain a valid set of random walks for 3 5 3 5 every change in the network 8 8 6 6 4 4 1 1 1 1 9 9 7 1 3 5 8 7 6 4 5 7 2 2 1 3 5 8 7 6 5 2 Obtain a set of . . Input network random walks . . . . . . 87 8 5 4 3 5 6 7 2 88 4 5 6 7 8 9 4 2 1 3 5 6 7 8 89 1 90 7 4 2 1 3 5 6 3 5 Treat the set of random walks as sentences 6 8 7 9 Feed sentences to a Learn a vector representation Skip-gram NN model for each node

  12. Example: Edge Addition t = 0 t = 1 3 5 3 5 8 8 addition of edge (1, 4) 6 6 4 4 1 1 1 9 9 3 7 2 7 2 simulate the rest of the RW { 4 1 1 1 4 3 5 6 7 8 2 7 3 5 8 7 6 4 5 3 5 8 7 6 4 5 1 1 2 2 1 3 5 8 7 6 5 2 1 3 5 8 7 6 5 . . . . . . . . need to update the RW set . . . . . . . . 87 8 5 4 3 5 6 7 87 8 5 4 3 5 6 7 4 5 6 7 8 9 8 4 5 6 7 8 9 8 88 88 89 2 1 3 5 6 7 8 89 2 1 3 5 6 7 8 7 4 2 1 3 5 6 7 4 2 1 3 5 6 90 90 similarly for edge deletion, node addition/deletion

  13. Efficiently Maintaining a Set of Random Walks

  14. EvoNRL Operations 3 5 3 5 8 8 6 6 4 4 + edge(n 1 , n 2 ) 1 1 1 9 9 7 2 7 2 1 4 3 5 6 7 8 2 1 3 5 8 7 6 4 5 3 5 8 7 6 4 5 1 Operations on RW 2 1 3 5 8 7 6 5 2 1 3 5 8 7 6 5 . . . . Search a node . . . . . . Delete a RW . . . . . . Insert a new RW 8 5 4 3 5 6 7 87 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 4 5 6 7 8 9 88 89 2 1 3 5 6 7 8 89 2 1 3 5 6 7 8 7 4 2 1 3 5 6 7 4 2 1 3 5 6 90 90 need for an efficient indexing data structure

  15. EvoNRL Indexing 1 3 5 8 7 6 4 5 1 3 5 8 7 6 5 2 3 5 . . 8 . . 6 . . 4 1 1 . . 8 5 4 3 5 6 7 87 9 7 2 88 4 5 6 7 8 9 2 1 3 5 6 7 8 89 90 7 4 2 1 3 5 6 each node is a keyword each RW is a document a set of RWs is a collection of documents Frequency Postings and Positions Term 1 3 < 2, 1 >, < 89, 2 >, < 90, 4 > 2 2 <89, 1>, <90, 3> 5 <1, 1>, <2, 1>, <87, 3>, <89, 3>, <90, 5> 3 4 4 <1, 6>, <87, 3>, <90, 2> 5 9 <1, 2>, <1, 7>, <2, 3>, <2, 7>, <87, 5>, <88, 2>, <89, 4>, <90, 6> 6 <1, 5>, <2, 6>, <87, 6>, <88, 3>, <89, 3>, <90, 5> 6 7 5 <1, 4>, <2, 5>, <87, 7>, <88, 4>, <89, 6>, <90, 7> 8 5 <1, 3>, <2, 4>, <87, 1>, <88, 6>, <89, 7> 1 <88, 7> 9

  16. Evaluation of EvoNRL

  17. Evaluation: EvoNRL vs StaticNRL Accuracy ฀ EvoNRL ≈ StaticNRL Running Time ฀ EvoNRL << StaticNRL

  18. Accuracy: edge addition EvoNRL has similar accuracy to StaticNRL (similar results for edge deletion, node addition/deletion)

  19. Time Performance 100 x 𝟑𝟏𝐲 EvoNRL performs orders of time faster than StaticNRL

  20. Takeaway how can we learn representations of an evolving network? EvoNR Ev oNRL time e ef efficient ent accurat ate gen ener eric met ethod od

  21. Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D. Social Media Mining & Analysis E. City Science / Urban Informatics / IoT F. Natural Language Processing

  22. Node Importance in Trajectory Networks Joint work with Tilemachos Pechlivanoglou

  23. Trajectories of moving objects every moving object, forms a traject jectory ry – in 2D it is a sequence of (x, y, t) there are trajectories of moving cars rs, peopl ople, birds ds , …

  24. Trajectory data mining trajectory similarity trajectory clustering trajectory anomaly detection trajectory pattern mining trajectory classification ...more we care about network analysis of moving objects

  25. Proximity networks θ θ proximity threshold

  26. Distance can represent line of sight wifi/bluetooth signal range

  27. Trajectory networks The Probl oblem em Input: put: logs of trajectories (x, y, t) in time period [0, T] Output: put: node importance metrics

  28. Node Importance

  29. Node importance in static networks Degree centrality Betweenness centrality Closeness centrality Eigenvector centrality

  30. Node importance in TNs node degree over time triangles over time connected components over time (connectedness)

  31. Applications infection spreading security in autonomous vehicles rich dynamic network analytics

  32. Evaluation of Node Importance in Trajectory Networks

  33. Naive approach For every ery discrete time unit t: 1. obtain static snaps pshot hot of the proximity network 2. run st static tic node importance algor orit ithms hms on snapshot Aggre grega gate te results at the end

  34. Streaming approach Similar to naive, but: ﹘ no fi final aggregation gregation ﹘ results calculated incremen ental tally ly at every step Still every y time unit

  35. Every discrete time unit ... 0 123 4 T time ...

  36. Sweep Line Over Trajectories (SLOT)

  37. Sweep line algorithm A c com ompu putatio tional al geom ometry ry algorithm that given line e se segments ts computes line segment ov overl rlaps ps Efficient on one pa pass ss algorithm that only processes line segments at the be beginn nning ing and ending ng points

  38. SLOT SL OT: Sweep Line Over Trajectories (algor orithm hm sk sketch) represent TN edges as time me interv ervals ls apply variatio ion of sweep line algorithm simultan si taneo eous usly ly compute node degree , triangle membership , connected components in on one pa pass ss

  39. Represent edges as time intervals L e 1 :(n 1 ,n 2 ) . edges . . e n t 1 t 2 t 3 t 4 t 6 t 5 t 7 t 8 t 10 t 9 t 11 t 12 t 13 0 T time

  40. SLOT: Sweep Line Over Trajectories

  41. At every edge star art ⦁ nod ode degre ree − nodes u, v now connected − increment u, v node degrees edges e:(u, v) ⦁ tri riangle le me memb mbers rship ip − did a triangle just form? − look for u, v common neighbors t 2 t 1 T 0 − increment triangle (u, v, common) time ⦁ con onnected cted com ompo ponents ts − did two previously disconnected components connect? u − compare old components of u, v v − if no overlap, merge them

  42. At every edge stop ⦁ nod ode degre ree − nodes u, v now disconnected − decrement u, v degree edges e:(u, v) ⦁ tri riangle le me memb mbers rship ip − did a triangle just break? − look for u, v common neighbors t 2 t 3 t 1 T 0 − decrement triangle (u, v, common) time ⦁ con onnected cted com ompo ponents ts − did a conn. compon. separate? − BFS to see if u, v still connected u − if not, split component to two v

  43. SLOT SL OT : At the end of the algorithm … Rich Analytics tics − node degrees es: start/end time, duration − triangl ngles es: start/end time, duration − conne nect cted ed componen onents: start/end time, duration Exa xact results (not approximations)

  44. Evaluation of SLOT

  45. Node degree 1550x 1550x

  46. Triangle membership / connected components

  47. SLOT Scalability

Recommend


More recommend