Fast and Accurate Mining of Evolving & Trajectory Networks - PowerPoint PPT Presentation

Fast and Accurate Mining of Evolving & Trajectory Networks Manos Papagelis York University, Toronto, Canada

Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D. Social Media Mining & Analysis E. City Science / Urban Informatics / IoT F. Natural Language Processing

EvoNRL: Evolving Network Representation Learning Based on Random Walks Joint work with Farzaneh Heidari

networks (universal language for describing complex data)

Classical ML Tasks in Networks ? ? ? ? ? community detection link prediction node classification ? triangle count graph similarity anomaly detection Limitations of Classical ML: • expensive computation (high dimension computations) • extensive domain knowledge (task specific)

Network Representation Learning (NRL) Network Low-dimension space several network structural properties can be learned/embedded ( nodes, edges, subgraphs, graphs, … ) Premise of NRL: • faster computations (low dimension computations) • agnostic domain knowledge (task independent)

Random Walk-based NRL 3 5 3 5 8 8 6 6 4 4 1 1 1 1 9 9 7 1 3 5 8 7 6 4 5 7 2 2 1 3 5 8 7 6 5 2 Obtain a set of . . Input network random walks . . . . . . 8 5 4 3 5 6 7 87 2 88 4 5 6 7 8 9 4 2 1 3 5 6 7 8 89 1 90 7 4 2 1 3 5 6 3 5 Treat the set of random walks as sentences 6 8 7 9 StaticNRL Learn a vector representation Feed sentences to a for each node Skip-gram NN model (DeepWalk , node2vec, …)

but real-world networks are constantly evolving

Evolving Network Representations Learning

Naive Approach t = 2 t = 1 t = 0 3 5 3 5 3 5 8 8 8 6 6 6 4 4 4 1 1 1 1 1 1 9 9 9 7 7 7 2 2 2 StaticNRL StaticNRL StaticNRL 2 2 1 1 4 3 3 5 4 5 4 6 6 8 6 2 8 8 1 7 7 7 9 9 9 3 5 Impractical (expensive, incomparable representations)

EvoNRL Key Idea dynamically maintain a valid set of random walks for 3 5 3 5 every change in the network 8 8 6 6 4 4 1 1 1 1 9 9 7 1 3 5 8 7 6 4 5 7 2 2 1 3 5 8 7 6 5 2 Obtain a set of . . Input network random walks . . . . . . 87 8 5 4 3 5 6 7 2 88 4 5 6 7 8 9 4 2 1 3 5 6 7 8 89 1 90 7 4 2 1 3 5 6 3 5 Treat the set of random walks as sentences 6 8 7 9 Feed sentences to a Learn a vector representation Skip-gram NN model for each node

Example: Edge Addition t = 0 t = 1 3 5 3 5 8 8 addition of edge (1, 4) 6 6 4 4 1 1 1 9 9 3 7 2 7 2 simulate the rest of the RW { 4 1 1 1 4 3 5 6 7 8 2 7 3 5 8 7 6 4 5 3 5 8 7 6 4 5 1 1 2 2 1 3 5 8 7 6 5 2 1 3 5 8 7 6 5 . . . . . . . . need to update the RW set . . . . . . . . 87 8 5 4 3 5 6 7 87 8 5 4 3 5 6 7 4 5 6 7 8 9 8 4 5 6 7 8 9 8 88 88 89 2 1 3 5 6 7 8 89 2 1 3 5 6 7 8 7 4 2 1 3 5 6 7 4 2 1 3 5 6 90 90 similarly for edge deletion, node addition/deletion

Efficiently Maintaining a Set of Random Walks

EvoNRL Operations 3 5 3 5 8 8 6 6 4 4 + edge(n 1 , n 2 ) 1 1 1 9 9 7 2 7 2 1 4 3 5 6 7 8 2 1 3 5 8 7 6 4 5 3 5 8 7 6 4 5 1 Operations on RW 2 1 3 5 8 7 6 5 2 1 3 5 8 7 6 5 . . . . Search a node . . . . . . Delete a RW . . . . . . Insert a new RW 8 5 4 3 5 6 7 87 87 8 5 4 3 5 6 7 88 4 5 6 7 8 9 4 5 6 7 8 9 88 89 2 1 3 5 6 7 8 89 2 1 3 5 6 7 8 7 4 2 1 3 5 6 7 4 2 1 3 5 6 90 90 need for an efficient indexing data structure

EvoNRL Indexing 1 3 5 8 7 6 4 5 1 3 5 8 7 6 5 2 3 5 . . 8 . . 6 . . 4 1 1 . . 8 5 4 3 5 6 7 87 9 7 2 88 4 5 6 7 8 9 2 1 3 5 6 7 8 89 90 7 4 2 1 3 5 6 each node is a keyword each RW is a document a set of RWs is a collection of documents Frequency Postings and Positions Term 1 3 < 2, 1 >, < 89, 2 >, < 90, 4 > 2 2 <89, 1>, <90, 3> 5 <1, 1>, <2, 1>, <87, 3>, <89, 3>, <90, 5> 3 4 4 <1, 6>, <87, 3>, <90, 2> 5 9 <1, 2>, <1, 7>, <2, 3>, <2, 7>, <87, 5>, <88, 2>, <89, 4>, <90, 6> 6 <1, 5>, <2, 6>, <87, 6>, <88, 3>, <89, 3>, <90, 5> 6 7 5 <1, 4>, <2, 5>, <87, 7>, <88, 4>, <89, 6>, <90, 7> 8 5 <1, 3>, <2, 4>, <87, 1>, <88, 6>, <89, 7> 1 <88, 7> 9

Evaluation of EvoNRL

Evaluation: EvoNRL vs StaticNRL Accuracy ฀ EvoNRL ≈ StaticNRL Running Time ฀ EvoNRL << StaticNRL

Accuracy: edge addition EvoNRL has similar accuracy to StaticNRL (similar results for edge deletion, node addition/deletion)

Time Performance 100 x 𝟑𝟏𝐲 EvoNRL performs orders of time faster than StaticNRL

Takeaway how can we learn representations of an evolving network? EvoNR Ev oNRL time e ef efficient ent accurat ate gen ener eric met ethod od

Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D. Social Media Mining & Analysis E. City Science / Urban Informatics / IoT F. Natural Language Processing

Node Importance in Trajectory Networks Joint work with Tilemachos Pechlivanoglou

Trajectories of moving objects every moving object, forms a traject jectory ry – in 2D it is a sequence of (x, y, t) there are trajectories of moving cars rs, peopl ople, birds ds , …

Trajectory data mining trajectory similarity trajectory clustering trajectory anomaly detection trajectory pattern mining trajectory classification ...more we care about network analysis of moving objects

Proximity networks θ θ proximity threshold

Distance can represent line of sight wifi/bluetooth signal range

Trajectory networks The Probl oblem em Input: put: logs of trajectories (x, y, t) in time period [0, T] Output: put: node importance metrics

Node Importance

Node importance in static networks Degree centrality Betweenness centrality Closeness centrality Eigenvector centrality

Node importance in TNs node degree over time triangles over time connected components over time (connectedness)

Applications infection spreading security in autonomous vehicles rich dynamic network analytics

Evaluation of Node Importance in Trajectory Networks

Naive approach For every ery discrete time unit t: 1. obtain static snaps pshot hot of the proximity network 2. run st static tic node importance algor orit ithms hms on snapshot Aggre grega gate te results at the end

Streaming approach Similar to naive, but: ﹘ no fi final aggregation gregation ﹘ results calculated incremen ental tally ly at every step Still every y time unit

Every discrete time unit ... 0 123 4 T time ...

Sweep Line Over Trajectories (SLOT)

Sweep line algorithm A c com ompu putatio tional al geom ometry ry algorithm that given line e se segments ts computes line segment ov overl rlaps ps Efficient on one pa pass ss algorithm that only processes line segments at the be beginn nning ing and ending ng points

SLOT SL OT: Sweep Line Over Trajectories (algor orithm hm sk sketch) represent TN edges as time me interv ervals ls apply variatio ion of sweep line algorithm simultan si taneo eous usly ly compute node degree , triangle membership , connected components in on one pa pass ss

Represent edges as time intervals L e 1 :(n 1 ,n 2 ) . edges . . e n t 1 t 2 t 3 t 4 t 6 t 5 t 7 t 8 t 10 t 9 t 11 t 12 t 13 0 T time

SLOT: Sweep Line Over Trajectories

At every edge star art ⦁ nod ode degre ree − nodes u, v now connected − increment u, v node degrees edges e:(u, v) ⦁ tri riangle le me memb mbers rship ip − did a triangle just form? − look for u, v common neighbors t 2 t 1 T 0 − increment triangle (u, v, common) time ⦁ con onnected cted com ompo ponents ts − did two previously disconnected components connect? u − compare old components of u, v v − if no overlap, merge them

At every edge stop ⦁ nod ode degre ree − nodes u, v now disconnected − decrement u, v degree edges e:(u, v) ⦁ tri riangle le me memb mbers rship ip − did a triangle just break? − look for u, v common neighbors t 2 t 3 t 1 T 0 − decrement triangle (u, v, common) time ⦁ con onnected cted com ompo ponents ts − did a conn. compon. separate? − BFS to see if u, v still connected u − if not, split component to two v

SLOT SL OT : At the end of the algorithm … Rich Analytics tics − node degrees es: start/end time, duration − triangl ngles es: start/end time, duration − conne nect cted ed componen onents: start/end time, duration Exa xact results (not approximations)

Evaluation of SLOT

Node degree 1550x 1550x

Triangle membership / connected components

SLOT Scalability

Fast and Accurate Mining of Evolving & Trajectory Networks - PowerPoint PPT Presentation

Fast and Accurate Mining of Evolving & Trajectory Networks Manos Papagelis York University, Toronto, Canada Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D.

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate

Bio Detectors Accurate and precise Stable system Fast and visible response Versatile

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

Exam Question 2 Write a Sound method named maximizeAt(int cap) that modifies the sound so that its

Process Safety Management (PSM) and Risk Management Plan (RMP): Identifying the Likelihood of

Synthesis of Dendronized Poly(fluorene)s with Peripheral Carbazole Groups LI Yi Introduction

Chapter 21: Phenomena Phenomena: Below are the names and pictures of several organic compounds.

Pseudotime and Trajectory Inference Stefania Giacomello The basics Cells display a

1 End-to-End 3D Multi-Object Tracking and Trajectory Forecasting Xinshuo Weng, Ye Yuan, Kris

CSE-571 Deterministic Path Planning in Robotics Courtesy of Maxim Likhachev University of

Pattern recognition in pedestrian movement trajectories Colin Kuntzsch

Fast and Accurate Mining of Evolving & Trajectory Networks - PowerPoint PPT Presentation

Fast and Accurate Mining of Evolving & Trajectory Networks Manos Papagelis York University, Toronto, Canada Current Research focus A. Network Representation Learning B. Trajectory Network Mining C. Streaming & Dynamic Graphs D.

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate

Bio Detectors Accurate and precise Stable system Fast and visible response Versatile

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

Exam Question 2 Write a Sound method named maximizeAt(int cap) that modifies the sound so that its

Process Safety Management (PSM) and Risk Management Plan (RMP): Identifying the Likelihood of

Synthesis of Dendronized Poly(fluorene)s with Peripheral Carbazole Groups LI Yi Introduction

Chapter 21: Phenomena Phenomena: Below are the names and pictures of several organic compounds.

Pseudotime and Trajectory Inference Stefania Giacomello The basics Cells display a

1 End-to-End 3D Multi-Object Tracking and Trajectory Forecasting Xinshuo Weng*, Ye Yuan*, Kris

CSE-571 Deterministic Path Planning in Robotics Courtesy of Maxim Likhachev University of

Pattern recognition in pedestrian movement trajectories Colin Kuntzsch

1 End-to-End 3D Multi-Object Tracking and Trajectory Forecasting Xinshuo Weng, Ye Yuan, Kris