Node Representation Learning Prof. Srijan Kumar - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Node Representation Learning Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Administrivia • Project midterm rubrik released – Discussion at the end • Proposal regrades done 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture • Introduction • Node embedding setup • Random walk approaches for node embedding • Project midterm rubrik These slides are inspired by Prof. Jure Leskovec’s CS224W lecture 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Machine Learning in Networks • Networks are complex • Need a uniform language to process various networks ? ? ? ? Machine Learning ? 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example: Node Classification • Classifying the function of proteins in the interactome Image from: Ganapathiraju et al. 2016. Schizophrenia interactome with 504 novel protein–protein interactions. Nature . 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example: Link Prediction • Which links exist in the network? ? ? x ? Machine Learning 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Machine Learning Lifecycle • Typical machine learning lifecycle requires feature engineering every single time! • Goal: avoid task-specific feature engineering Raw Structured Learning Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering task 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Feature Learning in Graphs • Goal: Efficient task-independent feature learning for machine learning with graphs! vec node u 𝑔: 𝑣 → ℝ & ℝ & Feature representation, embedding 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Why Network Embedding? • Task: We map each node in a network into • a low-dimensional space – Distributed representations for nodes – – Similarity of embeddings between nodes – indicates their network similarity – Encode network information and generate node – representation 17 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Example Node Embedding • 2D embeddings of nodes of the Zachary’s Karate Club network: • Zachary’s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Why Is It Hard? • Modern deep learning toolbox is designed for simple sequences or grids. – CNNs for fixed-size images/grids…. – RNNs or word2vec for text/sequences… 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Why Is It Hard? • But networks are far more complex! – Complex topographical structure (i.e., no spatial locality like grids) – No fixed node ordering or reference point (i.e., the isomorphism problem) – Often dynamic and have multimodal features. 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture • Introduction • Node embedding setup • Random walk approaches for node embedding • Project midterm rubrik 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Framework Setup • Assume we have a graph G : – V is the vertex set. – A is the adjacency matrix (assume binary). – No node features or extra information is used! 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Embedding Nodes • Goal: Encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Embedding Nodes Goal: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding Need to define! 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Learning Node Embeddings 1. Define an encoder (i.e., a mapping from nodes to embeddings) 2. Define a node similarity function (i.e., a measure of similarity in the original network) 3. Optimize the parameters of the encoder so that: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Two Key Components • Encoder: maps each node to a low- dimensional vector d -dimensional embedding enc ( v ) = z v node in the input graph • Similarity function: specifies how the relationships in vector space map to the relationships in the original network similarity( u, v ) ≈ z > v z u Similarity of u and v in dot product between node the original network embeddings 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

How to Define Node Similarity? • Key choice of methods is how they define node similarity. • E.g., should two nodes have similar embeddings if they…. – are connected? – share neighbors? – have similar “structural roles”? – …? 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture • Introduction • Node embedding setup • Random walk approaches for node embedding • Project midterm rubrik 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random Walk • Given a graph and a starting point , we select a neighbor of it at random , and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc. • The (random) sequence of points selected this way is a random walk on the graph . 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random-Walk Node Similarity probability that u z > and v co-occur on u z v ≈ a random walk over the network 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random-Walk Embeddings • Estimate probability of visiting node 𝑤 on a random walk starting from node 𝑣 using some random walk strategy R • Learn node embedding such that nearby nodes are close together in the network – Similarity here: dot product 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Unsupervised Feature Learning • Given a node 𝒗 , how do we define nearby nodes? – 𝑂 0 𝑣 = neighborhood of 𝑣 obtained by some random-walk strategy 𝑆 • Different neighborhood definitions give different algorithms – We will look at DeepWalk and node2vec 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random Walk Optimization Run short fixed-length random walks 1. starting from each node on the graph using some strategy R 2. For each node 𝒗, collect 𝑶 𝑺 (𝒗) , the multiset * of nodes visited on random walks starting from u 𝑂 0 (𝑣) can have repeat elements since nodes – can be visited multiple times on random walks 3. Optimize embeddings 25 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random Walk Optimization • High score (= embedding cosine similarity) of nodes appearing in random walk ; Low probability of other nodes • Expensive to calculate for all node pairs Use negative sampling • exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) predicted probability of 𝑣 sum over nodes 𝑤 sum over all seen on random and 𝑤 co-occuring on nodes 𝑣 random walk walks starting from 𝑣 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

DeepWalk [Perozzi et al., 2013] • What strategies should we use to run these random walks? • Simplest idea: Just run fixed-length, unbiased random walks starting from each node (i.e., DeepWalk from Perozzi et al., 2013). – The issue is that such notion of similarity is too constrained – Node2vec generalizes this 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

DeepWalk Example • 2D embeddings of nodes of the Zachary’s Karate Club network: • Zachary’s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 28 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

node2vec [Grover et al., 2016] • Goal: Embed nodes with similar network neighborhoods close in the feature space – Frame this goal as a maximum likelihood optimization problem, independent to the downstream prediction task • Key observation: Develop biased 2 nd order random walk 𝑆 to generate network neighborhood 𝑂 0 (𝑣) of node 𝑣 29 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

node2vec: Biased Walks Idea: use flexible, biased random walks that can trade off between local and global views of the network (Grover and Leskovec, 2016). s 1 s 2 s 8 s 7 BFS u s 6 DFS s 9 s 4 s 5 s 3 30 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Node Representation Learning Prof. Srijan Kumar - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Node Representation Learning Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Administrivia Project midterm

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {

NODE.JS ANTI-PATTERNS and bad practices ADOPTION OF NODE.JS KEEPS GROWING CHAMPIONS Walmart,

1 Agenda Node&Modules Module&Loaders Node&Packages

Dev Lab: Node + Express What is Node? Node.js = JavaScript + File I/O + A Package Manager or:

Menzies Distributing the world. Problem The whole world in one server API GET node/#id Returns

Recursive Structures in Python class Node: data: int next: Node An attribute can refer to

Eugene Syriani Project n = new Node(); n = new Node(); n = new Node(); n.add(graph);

Medial left-node raising in Japanese Shichi Yatabe University of Tokyo Right-node raising

R-Tree An R-tree is a depth-balanced tree Each node corresponds to a disk page Leaf

Sid iddh dharth Kr h Kris ishn hna procedure insert(lst: Node , elt: Node ) returns (res: Node

Linked Lists Walls and Mirrors Chapter 5 Linked Lists public class Node { private Object item;

Dominators Definition In a CFG, node a dominates b if every path from the start node to b passes

Introduction to Agent Based Modeling Dr.ir. Igor Nikolic Asociate professor, Faculty of TPM, TU

Odeds work on Noise Sensitivity Christophe Garban Universit Paris Sud and ENS Oded Schramm

Workshop on Dynamics and interactions in quantum gases Ma (Menorca) - 4 to 6 September

STOCHASTIC ANALYSIS OF REAL AND VIRTUAL STORAGE IN THE SMART GRID JeanYves Le Boudec, Nicolas

Implicit schemes for the equation of the BGK model Sandra Pieraccini, Gabriella Puppo

Singular behavior of a rarefied gas on a planar boundary Shigeru Takata ( ) Department

1 2 Radiometry and Photometry Physical measurement of electromagnetic energy Measure

Elisa Sena 1 , Allison McComiskey 2 , Graham Feingold 2 (elisats@if.usp.br) 1 University of So

Node Representation Learning Prof. Srijan Kumar - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Node Representation Learning Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Administrivia Project midterm

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node-&gt;m_data == value) {

NODE.JS ANTI-PATTERNS and bad practices ADOPTION OF NODE.JS KEEPS GROWING CHAMPIONS Walmart,

1 Agenda Node&amp;Modules Module&amp;Loaders Node&amp;Packages

Dev Lab: Node + Express What is Node? Node.js = JavaScript + File I/O + A Package Manager or:

Menzies Distributing the world. Problem The whole world in one server API GET node/#id Returns

Recursive Structures in Python class Node: data: int next: Node An attribute can refer to

Eugene Syriani Project n = new Node(); n = new Node(); n = new Node(); n.add(graph);

Medial left-node raising in Japanese Shichi Yatabe University of Tokyo Right-node raising

R-Tree An R-tree is a depth-balanced tree Each node corresponds to a disk page Leaf

Sid iddh dharth Kr h Kris ishn hna procedure insert(lst: Node , elt: Node ) returns (res: Node

Linked Lists Walls and Mirrors Chapter 5 Linked Lists public class Node { private Object item;

Dominators Definition In a CFG, node a dominates b if every path from the start node to b passes

Introduction to Agent Based Modeling Dr.ir. Igor Nikolic Asociate professor, Faculty of TPM, TU

Odeds work on Noise Sensitivity Christophe Garban Universit Paris Sud and ENS Oded Schramm

Workshop on Dynamics and interactions in quantum gases Ma (Menorca) - 4 to 6 September

STOCHASTIC ANALYSIS OF REAL AND VIRTUAL STORAGE IN THE SMART GRID JeanYves Le Boudec, Nicolas

Implicit schemes for the equation of the BGK model Sandra Pieraccini, Gabriella Puppo

Singular behavior of a rarefied gas on a planar boundary Shigeru Takata ( ) Department

1 2 Radiometry and Photometry Physical measurement of electromagnetic energy Measure

Elisa Sena 1 , Allison McComiskey 2 , Graham Feingold 2 (elisats@if.usp.br) 1 University of So

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {

1 Agenda Node&Modules Module&Loaders Node&Packages