CSE 6240: Web Search and Text Mining. Spring 2020 Graph Neural Networks Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Introduction to deep graph embeddings • Graph convolution networks • GraphSAGE 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Goal: Node Embeddings Goal: similarity( u, v ) ≈ z > v z u Need to define! d-dimensional Input network embedding space 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Deep Graph Encoders • Encoder: Map a node to a low-dimensional vector: enc ( v ) = z v • Deep encoder methods based on graph neural networks: multiple layers of enc ( v ) = non-linear transformations of graph structure • Graph encoders idea is inspired by CNN on (Animation Vincent Dumoul Image Graph images 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea from Convolutional Networks • In CNN, pixel representation is created by transforming neighboring pixel representation – In GNN, node representations are created by transforming neighboring node representation • But graphs are irregular , unlike images – So, generalize convolutions beyond simple lattices, and leverage node features/attributes • Solution: deep graph encoders 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Deep Graph Encoders • Once an encoder is defined, multiple layers of encoders can be stacked … Output: Node embeddings, embed larger network structures, subgraphs, graphs 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Graph Encoder: A Naïve Approach • Join adjacency matrix and features • Feed them into a deep neural network: • Done? A B C D E Feat A 0 1 1 1 0 1 0 ? A B B 1 0 0 1 1 0 0 E C 1 0 0 1 0 0 1 C D D 1 1 1 0 1 1 1 E 0 1 0 1 0 1 0 • Issues with this idea: – 𝑃(|𝑊|) parameters – Not applicable to graphs of different sizes – Not invariant to node ordering 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Graph Encoders: Two Instantiations 1. Graph convolution networks (GCN): one of the first frameworks to learn node embeddings in an end-to-end manner Different from random walk methods, which are – not end-to-end 2. GraphSAGE: generalized GCNs to various neighborhood aggregations 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Introduction to deep graph embeddings • Graph convolution networks (GCN) • GraphSAGE Main paper: “Semi-Supervised Classification with Graph Convolutional Networks”, Kipf and Welling, ICLR 2017 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Content • Local network neighborhoods: – Describe aggregation strategies – Define computation graphs • Stacking multiple layers: – Describe the model, parameters, training – How to fit the model? – Simple example for unsupervised and supervised training 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Setup • Assume we have a graph 𝐻 : – 𝑊 is the vertex set – 𝑩 is the adjacency matrix (assume binary) – 𝒀 ∈ ℝ +×|-| is a matrix of node features – Social networks: User profile, User image – Biological networks: Gene expression profiles – If there are no features, use: » Indicator vectors (one-hot encoding of a node) » Vector of constant 1: [1, 1, …, 1] 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Graph Convolutional Networks • Idea: Generate node embeddings based on local network neighborhoods – A node’s neighborhood defines its computation graph • Learn how to aggregate information from the neighborhood to learn node embeddings – Transform information from the neighbors and combine it: Transform “messages” ℎ / from neighbors: 𝑋 / ℎ / • 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea: Aggregate Neighbors • Intuition: Generate node embeddings based on local network neighborhoods • Nodes aggregate information from their neighbors using neural networks A C TARGET NODE B B A A C B C A E F D F E D A INPUT GRAPH Neural networks 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Idea: Aggregate Neighbors • Intuition: Network neighborhood defines a computation graph Every node defines a computation graph based on its neighborhood 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Deep Model: Many Layers • Model can be of arbitrary depth: – Nodes have embeddings at each layer – Layer-0 embedding of node 𝒗 is its input feature, 𝒚 𝒗 – Layer-K embedding gets information from nodes that are atmost K hops away Layer-0 Layer-1 x A A x C C TARGET NODE B B Layer-2 x A A A x B C B C A x E E F D x F F E D A x A INPUT GRAPH 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Neighborhood Aggregation • Neighborhood aggregation: Key distinctions are in how different approaches aggregate information across the layers A ? C TARGET NODE B B What is in the box? A A C B ? ? C A E F D F E ? D INPUT GRAPH A 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Neighborhood Aggregation • Basic approach: Average information from neighbors and apply a neural network (1) average messages A from neighbors C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A (2) apply neural network 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
The Math: Deep Encoder • Basic approach: Average neighbor messages and apply a neural network – Note: Apply L2 normalization for each node embedding at every layer Previous layer Initial 0-th layer embeddings are h 0 v = x v embedding of v equal to node features 0 1 h k − 1 X | N ( v ) | + B k h k − 1 h k u A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) Average of neighbor’s z v = h K v previous layer embeddings Non-linearity Embedding after K layers of (e.g., ReLU) neighborhood aggregation 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
GCN: Matrix Form • H (l) is the representation in l th layer (l) and W 1 (l) are matrices to be learned for • W 0 each layer • A = adjacency matrix, D = diagonal degree matrix • GCN rewritten in vector form: 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Training the Model • How do we train the model? – Need to define a loss function on the embeddings 𝒜 5 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Model Parameters • We can feed these embeddings into any loss function and run stochastic gradient descent to train the weight parameters – Once we have the weight matrices, we can calculate the node embeddings Trainable weight matrices h 0 v = x v (i.e., what we learn) 0 1 h k − 1 X | N ( v ) | + B k h k − 1 u h k A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) z v = h K v 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Unsupervised Training • Training can be unsupervised or supervised • Unsupervised training: – Use only the graph structure: “Similar” nodes have similar embeddings – Common unsupervised loss function = edge existence • Unsupervised loss function can be anything from the last section, e.g., a loss based on – Node proximity in the graph – Random walks 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Supervised Training • Train the model for a supervised task (e.g., node E.g., Normal or anomalous node? classification) • Two ways: – Total loss = supervised loss – Total loss = supervised loss + unsupervised loss 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Model Design: Overview (1) Define a neighborhood aggregation function 𝒜 5 (2) Define a loss function on the embeddings 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Model Design: Overview (3) Train on a set of nodes, i.e., a batch of compute graphs 25 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Model Design: Overview (4) Generate embeddings for nodes as needed Even for nodes we never trained on! 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
GCN: Inductive Capability • The same aggregation parameters are shared for all nodes: – The number of model parameters is sublinear in |𝑊| and we can generalize to unseen nodes shared parameters B A W k B k C F shared parameters D E INPUT GRAPH Compute graph for node A Compute graph for node B 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Recommend
More recommend