Project Proposal deadline : tonight, 11:59pm Course Notes: https://snap-stanford.github.io/cs224w-notes/ Help us write the course notes – we will give generous bonuses! CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
¡ Intuition: Map nodes to 𝑒 -dimensional embeddings such that similar nodes in the graph are embedded close together f( )= 2D node embeddings Input graph How to learn mapping function 𝒈 ? 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
¡ Goal: Map nodes so that similarity in the embedding space (e.g., dot product) approximates similarity (e.g., proximity) in the network d-dimensional Input network embedding space 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3
Goal: similarity( u, v ) ≈ z > v z u Need to define! d-dimensional Input network embedding space 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4
¡ Encoder: Map a node to a low-dimensional vector: d-dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function defines how relationships in the input network map to relationships in the embedding space: similarity( u, v ) ≈ z > v z u Similarity of u and v dot product between in the network node embeddings 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
¡ So far we have focused on “shallow” encoders , i.e. embedding lookups: embedding vector for a specific node embedding matrix Dimension/size of Z = embeddings one column per node 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
Shallow encoders: § One-layer of data transformation § A single hidden layer maps node 𝑣 to embedding 𝒜 % via function 𝑔 (), e.g. 𝒜 % = 𝑔 𝒜 ( , 𝑤 ∈ 𝑂 - 𝑣 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
¡ Limitations of shallow embedding methods: § O(|V|) parameters are needed : § No sharing of parameters between nodes § Every node has its own unique embedding § Inherently “transductive ”: § Cannot generate embeddings for nodes that are not seen during training § Do not incorporate node features : § Many graphs have features that we can and should leverage 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
¡ Today: We will now discuss deep methods based on graph neural networks: multiple layers of enc ( v ) = non-linear transformations of graph structure ¡ Note: All these deep encoders can be combined with node similarity functions defined in the last lecture 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9
… Output: Node embeddings Also, we can embed larger network structures, subgraphs, graphs 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10
Im Imag ages es Te Text/Speech Modern deep learning toolbox is designed for simple sequences & grids Jure Leskovec, Stanford University 11
But networks are far more complex! § Arbitrary size and complex topological structure (i.e., no spatial locality like grids) vs. vs Text Te Networks ks Im Imag ages es § No fixed node ordering or reference point § Often dynamic and have multimodal features Jure Leskovec, Stanford University 12
CNN on an image: Goal l is is to genera raliz lize convolu lutio ions beyond sim imple le la lattic ices Levera rage node features/attrib ributes (e (e.g. g., te text, xt, im images) 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
Single CNN layer with 3x3 filter: (Animation Vincent Dumoul Im Image Gr Graph Transform information at the neighbors and combine it: § Transform “messages” ℎ / from neighbors: 𝑋 / ℎ / § Add them up: ∑ / 𝑋 / ℎ / 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
But what if your graphs look like this? s like this? or this: or this: ¡ Examples: Biological networks, Medical networks, Social networks, Information networks, Knowledge graphs, Communication networks, Web graph, … 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15
¡ Join adjacency matrix and features ¡ Feed them into a deep neural net: • Done? A B C D E Feat A 0 1 1 1 0 1 0 ? A B B 1 0 0 1 1 0 0 E C 1 0 0 1 0 0 1 C D D ¡ Issues with this idea: 1 1 1 0 1 1 1 E 0 1 0 1 0 1 0 ¡ Issues with this idea: § 𝑃(𝑂) parameters § Not applicable to graphs of different sizes § Not invariant to node ordering 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16
1. Basics of deep learning for graphs 2. Graph Convolutional Networks 3. Graph Attention Networks (GAT) 4. Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17
¡ Local network neighborhoods: § Describe aggregation strategies § Define computation graphs ¡ Stacking multiple layers: § Describe the model, parameters, training § How to fit the model? § Simple example for unsupervised and supervised training 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
¡ Assume we have a graph 𝐻 : § 𝑊 is the vertex set § 𝑩 is the adjacency matrix (assume binary) § 𝒀 ∈ ℝ :×|=| is a matrix of node features § Node features: § Social networks: User profile, User image § Biological networks: Gene expression profiles, gene functional information § No features: § Indicator vectors (one-hot encoding of a node) § Vector of constant 1: [1, 1, …, 1] 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
[Kipf and Welling, ICLR 2017] Idea: Node’s neighborhood defines a computation graph 𝑗 𝑗 Determine node Propagate and computation graph transform information Learn how to propagate information across the graph to compute node features 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ Key idea: Generate node embeddings based on local network neighborhoods A C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22
¡ Intuition: Nodes aggregate information from their neighbors using neural networks A C TARGET NODE B B A A C B C A E F D F E D A INPUT GRAPH Neural networks 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
¡ Intuition: Network neighborhood defines a computation graph Every node defines a computation graph based on its neighborhood! 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
¡ Model can be of arbitrary depth: § Nodes have embeddings at each layer § Layer-0 embedding of node 𝑣 is its input feature, 𝑦 𝑣 § Layer-K embedding gets information from nodes that are K hops away La Layer-0 La Layer-1 x A A x C C TARGET NODE B B La Layer-2 x A A A x B C B C A x E E F D x F F E D A x A INPUT GRAPH 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
¡ Neighborhood aggregation: Key distinctions are in how different approaches aggregate information across the layers A ? C TARGET NODE B B What is in the box? A A C B ? ? C A E F D F E ? D INPUT GRAPH A 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
¡ Basic approach: Average information from neighbors and apply a neural network (1) average messages A from neighbors C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A (2) apply neural network 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
¡ Basic approach: Average neighbor messages and apply a neural network Initial 0-th layer embeddings are Previous layer equal to node features h 0 v = x v embedding of v 0 1 h k − 1 X | N ( v ) | + B k h k − 1 h k u A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) Average of neighbor’s z v = h K v previous layer embeddings Embedding after K Non-linearity layers of neighborhood (e.g., ReLU) aggregation 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28
How do we train the model to generate embeddings? 𝒜 @ Need to define a loss function on the embeddings 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29
Recommend
More recommend