http://cs224w.stanford.edu Intuition: Map nodes to -dimensional - PowerPoint PPT Presentation

Project Proposal deadline : tonight, 11:59pm Course Notes: https://snap-stanford.github.io/cs224w-notes/ Help us write the course notes – we will give generous bonuses! CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu

¡ Intuition: Map nodes to 𝑒 -dimensional embeddings such that similar nodes in the graph are embedded close together f( )= 2D node embeddings Input graph How to learn mapping function 𝒈 ? 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

¡ Goal: Map nodes so that similarity in the embedding space (e.g., dot product) approximates similarity (e.g., proximity) in the network d-dimensional Input network embedding space 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3

Goal: similarity( u, v ) ≈ z > v z u Need to define! d-dimensional Input network embedding space 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

¡ Encoder: Map a node to a low-dimensional vector: d-dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function defines how relationships in the input network map to relationships in the embedding space: similarity( u, v ) ≈ z > v z u Similarity of u and v dot product between in the network node embeddings 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

¡ So far we have focused on “shallow” encoders , i.e. embedding lookups: embedding vector for a specific node embedding matrix Dimension/size of Z = embeddings one column per node 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

Shallow encoders: § One-layer of data transformation § A single hidden layer maps node 𝑣 to embedding 𝒜 % via function 𝑔 (), e.g. 𝒜 % = 𝑔 𝒜 ( , 𝑤 ∈ 𝑂 - 𝑣 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

¡ Limitations of shallow embedding methods: § O(|V|) parameters are needed : § No sharing of parameters between nodes § Every node has its own unique embedding § Inherently “transductive ”: § Cannot generate embeddings for nodes that are not seen during training § Do not incorporate node features : § Many graphs have features that we can and should leverage 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

¡ Today: We will now discuss deep methods based on graph neural networks: multiple layers of enc ( v ) = non-linear transformations of graph structure ¡ Note: All these deep encoders can be combined with node similarity functions defined in the last lecture 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

… Output: Node embeddings Also, we can embed larger network structures, subgraphs, graphs 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

Im Imag ages es Te Text/Speech Modern deep learning toolbox is designed for simple sequences & grids Jure Leskovec, Stanford University 11

But networks are far more complex! § Arbitrary size and complex topological structure (i.e., no spatial locality like grids) vs. vs Text Te Networks ks Im Imag ages es § No fixed node ordering or reference point § Often dynamic and have multimodal features Jure Leskovec, Stanford University 12

CNN on an image: Goal l is is to genera raliz lize convolu lutio ions beyond sim imple le la lattic ices Levera rage node features/attrib ributes (e (e.g. g., te text, xt, im images) 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

Single CNN layer with 3x3 filter: (Animation Vincent Dumoul Im Image Gr Graph Transform information at the neighbors and combine it: § Transform “messages” ℎ / from neighbors: 𝑋 / ℎ / § Add them up: ∑ / 𝑋 / ℎ / 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

But what if your graphs look like this? s like this? or this: or this: ¡ Examples: Biological networks, Medical networks, Social networks, Information networks, Knowledge graphs, Communication networks, Web graph, … 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

¡ Join adjacency matrix and features ¡ Feed them into a deep neural net: • Done? A B C D E Feat A 0 1 1 1 0 1 0 ? A B B 1 0 0 1 1 0 0 E C 1 0 0 1 0 0 1 C D D ¡ Issues with this idea: 1 1 1 0 1 1 1 E 0 1 0 1 0 1 0 ¡ Issues with this idea: § 𝑃(𝑂) parameters § Not applicable to graphs of different sizes § Not invariant to node ordering 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

1. Basics of deep learning for graphs 2. Graph Convolutional Networks 3. Graph Attention Networks (GAT) 4. Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

¡ Local network neighborhoods: § Describe aggregation strategies § Define computation graphs ¡ Stacking multiple layers: § Describe the model, parameters, training § How to fit the model? § Simple example for unsupervised and supervised training 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

¡ Assume we have a graph 𝐻 : § 𝑊 is the vertex set § 𝑩 is the adjacency matrix (assume binary) § 𝒀 ∈ ℝ :×|=| is a matrix of node features § Node features: § Social networks: User profile, User image § Biological networks: Gene expression profiles, gene functional information § No features: § Indicator vectors (one-hot encoding of a node) § Vector of constant 1: [1, 1, …, 1] 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

[Kipf and Welling, ICLR 2017] Idea: Node’s neighborhood defines a computation graph 𝑗 𝑗 Determine node Propagate and computation graph transform information Learn how to propagate information across the graph to compute node features 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21

¡ Key idea: Generate node embeddings based on local network neighborhoods A C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

¡ Intuition: Nodes aggregate information from their neighbors using neural networks A C TARGET NODE B B A A C B C A E F D F E D A INPUT GRAPH Neural networks 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

¡ Intuition: Network neighborhood defines a computation graph Every node defines a computation graph based on its neighborhood! 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

¡ Model can be of arbitrary depth: § Nodes have embeddings at each layer § Layer-0 embedding of node 𝑣 is its input feature, 𝑦 𝑣 § Layer-K embedding gets information from nodes that are K hops away La Layer-0 La Layer-1 x A A x C C TARGET NODE B B La Layer-2 x A A A x B C B C A x E E F D x F F E D A x A INPUT GRAPH 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

¡ Neighborhood aggregation: Key distinctions are in how different approaches aggregate information across the layers A ? C TARGET NODE B B What is in the box? A A C B ? ? C A E F D F E ? D INPUT GRAPH A 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

¡ Basic approach: Average information from neighbors and apply a neural network (1) average messages A from neighbors C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A (2) apply neural network 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

¡ Basic approach: Average neighbor messages and apply a neural network Initial 0-th layer embeddings are Previous layer equal to node features h 0 v = x v embedding of v 0 1 h k − 1 X | N ( v ) | + B k h k − 1 h k u A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) Average of neighbor’s z v = h K v previous layer embeddings Embedding after K Non-linearity layers of neighborhood (e.g., ReLU) aggregation 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

How do we train the model to generate embeddings? 𝒜 @ Need to define a loss function on the embeddings 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29

http://cs224w.stanford.edu Intuition: Map nodes to -dimensional - PowerPoint PPT Presentation

Project Proposal deadline : tonight, 11:59pm Course Notes: https://snap-stanford.github.io/cs224w-notes/ Help us write the course notes we will give generous bonuses! CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

1 Ancient DNA: would the real Neandertal please stand up? Eur. Eur. Afr. Asia Afr. Asia H.

Motivation / Problem Optimization of behavior in respect of explicit evaluation function

Analysis of Ashley Sawle based on slides by Bernard Pereira The many faces of

The impact of genetic drift on the runtime of simple estimation-of-distribution algorithms Dirk

Coral holobiont analysis with MinION sequencer onboard Tara Quentin Carradec Julie Poulain

Biodiversity 1. avoid direct quotations 2. support arguments with DATA or LOGIC. The paper

The extended phenotype of Eucalyptus globulus B Potts 1 , R Barbour 1 , J OReilly-Wapstra 1 , S

Local Search (Ch. 4-4.1) Local search Before we tried to find a path from the start state to a