http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu

? ? ? ? Machine Learning ? Node classification 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

? ? x ? Machine Learning 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

¡ (Supervised) Machine Learning Lifecycle requires feature engineering every single time! Raw Learning Structured Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering task 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

Goal: Efficient task-independent feature learning for machine learning with graphs! vec node u 𝑔: 𝑣 → ℝ & ℝ & Feature representation, embedding 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

¡ Task: We map each node in a network into a • low-dimensional space § Distributed representations for nodes – § Similarity of embeddings between nodes indicates – their network similarity § Encode network information and generate node – representation 17 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

¡ 2D embeddings of nodes of the Zachary’s Karate Club network: • Zachary’s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

¡ Modern deep learning toolbox is designed for simple sequences or grids. § CNNs for fixed-size images/grids…. § RNNs or word2vec for text/sequences… 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

¡ But networks are far more complex! § Complex topographical structure (i.e., no spatial locality like grids) § No fixed node ordering or reference point (i.e., the isomorphism problem) § Often dynamic and have multimodal features. 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

¡ Assume we have a graph G : § V is the vertex set. § A is the adjacency matrix (assume binary). § No node features or extra information is used! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

¡ Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

Go Goal: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding Ne Need t to d define! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

Define an encoder (i.e., a mapping from 1. nodes to embeddings) Define a node similarity function (i.e., a 2. measure of similarity in the original network) Optimize the parameters of the encoder so 3. that: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

¡ Encoder: maps each node to a low- dimensional vector d -dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function: specifies how the relationships in vector space map to the relationships in the original network similarity( u, v ) ≈ z > v z u Similarity of u and v in dot product between node the original network embeddings 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

¡ Simplest encoding approach: encoder is just an embedding-lookup enc ( v ) = Zv matrix, each column is a node Z ∈ R d × |V| embedding [w [what w we l learn!] !] indicator vector, all zeroes v ∈ I |V| except a one in column indicating node v 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

¡ Simplest encoding approach: encoder is just an embedding-lookup embedding vector for a specific node embedding matrix Dimension/size Z = of embeddings one column per node 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

Simplest encoding approach: encoder is just an embedding-lookup Each node is assigned to a unique embedding vector Many methods: DeepWalk, node2vec, TransE 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

¡ Key choice of methods is how they define node similarity. ¡ E.g., should two nodes have similar embeddings if they…. § are connected? § share neighbors? § have similar “structural roles”? § …? 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

Material based on: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. • Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. •

10 9 12 2 8 1 11 3 Given a graph and a starting 4 point , we select a neighbor of it at random , and move to this neighbor; then we select 6 a neighbor of this point at 5 random, and move to it, etc. The (random) sequence of points selected this way is a 7 random walk on the graph . 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

probability that u z > and v co-occur on u z v ≈ a random walk over the network 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

Estimate probability of visiting node 𝒘 on a 1. random walk starting from node 𝒗 using some random walk strategy 𝑺 Optimize embeddings to encode these 2. random walk statistics: Similarity (here: dot product= cos(𝜄) ) encodes random walk “similarity” 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

Expressivity: Flexible stochastic definition of 1. node similarity that incorporates both local and higher-order neighborhood information Efficiency: Do not need to consider all node 2. pairs when training; only need to consider pairs that co-occur on random walks 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

¡ Intuition: Find embedding of nodes to d-dimensions that preserves similarity ¡ Idea: Learn node embedding such that nearby nodes are close together in the network ¡ Given a node 𝑣 , how do we define nearby nodes? § 𝑂 7 𝑣 … neighbourhood of 𝑣 obtained by some strategy 𝑆 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

¡ Given 𝐻 = (𝑊, 𝐹) , ¡ Our goal is to learn a mapping 𝑨: 𝑣 → ℝ & . ¡ Log-likelihood objective: max C log P(𝑂 J (𝑣)| 𝑨 D ) B D ∈F § where 𝑂 7 (𝑣) is neighborhood of node 𝑣 by strategy 𝑆 ¡ Given node 𝑣 , we want to learn feature representations that are predictive of the nodes in its neighborhood 𝑂 J (𝑣) 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

Run short fixed-length random walks 1. starting from each node on the graph using some strategy R For each node 𝑣 collect 𝑂 7 (𝑣) , the multiset * 2. of nodes visited on random walks starting from u Optimize embeddings according to: Given 3. node 𝑣 , predict its neighbors 𝑂 J (𝑣) max C log P(𝑂 J (𝑣)| 𝑨 D ) B D ∈F * 𝑂 7 (𝑣) can have repeat elements since nodes can be visited multiple times on random walks 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

X X L = − log( P ( v | z u )) u ∈ V v ∈ N R ( u ) • Intuition: Optimize embeddings to maximize likelihood of random walk co-occurrences • Parameterize 𝑄(𝑤|𝒜 𝑣 ) using softmax: exp( z > Why softmax? u z v ) We want node 𝑤 to be P ( v | z u ) = most similar to node 𝑣 P n 2 V exp( z > u z n ) (out of all nodes 𝑜 ). Intuition: ∑ R exp 𝑦 R ≈ max exp(𝑦 R ) R 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30

Putting it all together: exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) predicted probability of 𝑣 sum over nodes 𝑤 sum over all and 𝑤 co-occuring on seen on random nodes 𝑣 random walk walks starting from 𝑣 Optimizing random walk embeddings = Finding embeddings z u that minimize L 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31

But doing this naively is too expensive!! exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) Nested sum over nodes gives O(|V| 2 ) complexity! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs,

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

kb -Anonymity: A Model for Anonymized kb Behavior-Preserving Test and Debugging Data Where is

Social Processes, Information Flow, and Anonymized Network Data Jon Kleinberg Cornell University

k IP IP: a Measured Approach ch to IPv6 Ad Addres ess An Anon onymiz ization ion MAPRG

Key parse TCP assembly Offline Online capture anonymize Anon. One-Way Interface Key (anon.

Outline Anonymous communications techniques CSci 5271 Announcements intermission Introduction

Safely Sharing Data Between CSIRTs: The SCRUB* Security Anonymization Tool Infrastructure William

Towards Data Anonymization in Data Mining via Meta-Heuristic Approaches Fatemeh Amiri, Gerald

De#anonymizing,Social,Networks, and,Inferring,Private,Attributes, Using,Knowledge,Graphs,