introduction generative model for graphs
play

Introduction: Generative Model for Graphs Modeling graphs is - PowerPoint PPT Presentation

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University of Toronto, Vector Institute 1


  1. GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University of Toronto, Vector Institute 1

  2. Introduction: Generative Model for Graphs Modeling graphs is fundamental for studying networks e.g. medical, chemical, social Goal: Model and efficiently sample complex distributions over graphs Learn generative model from observed set of graphs 2

  3. Challenges in Graph Generation Large and variable output spaces Graph with n nodes requires n 2 to fully specify structure Number of nodes and edges varies between different graphs Non-unique representations Distributions over graphs without assuming fixed set of nodes n node graph represented by up to n ! equivalent adjacency matrices π ∈ Π is arbitrary node ordering Complex, non-local dependencies New edges depend on previously generated edges 3

  4. Overview to GraphRNN Decompose graph generation into two RNNs: • Graph-level: generates sequence of nodes • Edge-level: generates sequence of edges for each new node 4

  5. Modeling Graphs as Sequences Graph G ∼ p ( G ) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S ( G , π ) = ( S π 1 , . . . , S π n ) (1) Each sequence element is adjacency vector i ∈ { 0 , 1 } i − 1 S π i ∈ { 1 , . . . , n } for edges between node π ( v i ) and π ( v j ) , j ∈ { 1 , . . . , i − 1 } 5

  6. Modeling Graphs as Sequences Graph G ∼ p ( G ) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S ( G , π ) = ( S π 1 , . . . , S π n ) (1) Each sequence element is adjacency vector i ∈ { 0 , 1 } i − 1 S π i ∈ { 1 , . . . , n } for edges between node π ( v i ) and π ( v j ) , j ∈ { 1 , . . . , i − 1 } 5

  7. Modeling Graphs as Sequences Graph G ∼ p ( G ) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S ( G , π ) = ( S π 1 , . . . , S π n ) (1) Each sequence element is adjacency vector i ∈ { 0 , 1 } i − 1 S π i ∈ { 1 , . . . , n } for edges between node π ( v i ) and π ( v j ) , j ∈ { 1 , . . . , i − 1 } 5

  8. Modeling Graphs as Sequences Graph G ∼ p ( G ) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S ( G , π ) = ( S π 1 , . . . , S π n ) (1) Each sequence element is adjacency vector i ∈ { 0 , 1 } i − 1 S π i ∈ { 1 , . . . , n } for edges between node π ( v i ) and π ( v j ) , j ∈ { 1 , . . . , i − 1 } 5

  9. Modeling Graphs as Sequences Graph G ∼ p ( G ) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S ( G , π ) = ( S π 1 , . . . , S π n ) (1) Each sequence element is adjacency vector i ∈ { 0 , 1 } i − 1 S π i ∈ { 1 , . . . , n } for edges between node π ( v i ) and π ( v j ) , j ∈ { 1 , . . . , i − 1 } 5

  10. Distribution on Graphs → Distribution on Sequences Instead of learning p ( G ) sample, π ∼ Π to get observations of S π Then learn p ( S π ) modeled autoregressively: � p ( S π ) ✶ [ f G ( S π ) = G ] p ( G ) = (3) S π Exploiting sequential structure of S π , decompose p ( S π ) n +1 P ( S π ) = � p ( S π i | S π 1 , . . . , S π i − 1 ) (4) i =1 n +1 � p ( S π i | S π = < i ) i =1 6

  11. Motivating GraphRNN Model p ( G ) Distribution over graphs ↓ Model p ( S π ) Distribution over sequence of edge connections ↓ Model p ( S π i | S π < i ) Distribution over edge connections for i -th node conditioned on previous nodes’ edge connections parameterize with an expressive neural network 7

  12. GraphRNN Framework Idea: Use an RNN that consists of a state-transition function and an output function: h i = f trans ( h i − 1 , S π i − 1 ) (5) θ i = f out ( h i ) (6) • h i ∈ R d encodes the state of the graph generated so far • S π i − 1 encodes adjacency for most recently generated node i − 1 • θ i specifies the distribution of next node’s adjacency vector S π i ∼ P θ i • f trans and f out can be arbitrary neural networks • P θ i can be an arbitrary distribution over binary vectors 8

  13. GraphRNN Framework Corrected Idea: Use an RNN that consists of a state-transition function and an output function: h i = f trans ( h i − 1 , S π i ) (5) θ i +1 = f out ( h i ) (6) • h i ∈ R d encodes the state of the graph generated so far • S π i encodes adjacency for most recently generated node i • θ i +1 specifies the distribution of next node’s adjacency vector S π i +1 ∼ P θ i +1 • f trans and f out can be arbitrary neural networks • P θ i can be an arbitrary distribution over binary vectors. 9

  14. GraphRNN Framework Corrected Idea: Use an RNN that consists of a state-transition function and an output function: h i = f trans ( h i − 1 , S π i ) (5) θ i +1 = f out ( h i ) (6) S π i +1 ∼ P θ i +1 10

  15. GraphRNN Inference Algorithm Algorithm 1 GraphRNN inference algorithm Input: RNN-based transition module f trans , output module f out , probability distribution P θ i parameterized by θ i , start token SOS , end token EOS , empty graph state h ′ Output: Graph sequence S π S π 0 = SOS , h 0 = h ′ , i = 0 repeat i = i + 1 h i = f trans ( h i − 1 , S π i − 1 ) { update graph state } θ i = f out ( h i ) S π i ∼ P θ i { sample node i ’s edge connections } until S π i is EOS Return S π = ( S π 1 , ..., S π i ) 11

  16. GraphRNN Inference Algorithm Corrected Algorithm 1 GraphRNN inference algorithm Input: RNN-based transition module f trans , output module f out , probability distribution P θ i parameterized by θ i , start token SOS , end token EOS , empty graph state h ′ Output: Graph sequence S π S π 01 = SOS , h 0 = h ′ , i = 0 ✁ repeat i = i + 1 h i = f trans ( h i − 1 , S π i − 1 i ) { update graph state } ✟ ✟ θ ✚ i i +1 = f out ( h i ) i i +1 { sample node ✚ ✚ S π i i +1 ∼ P θ ✚ i i + 1’s edge connections } ✚ until S π i i +1 is EOS ✚ Return S π = ( S π 1 , ..., S π i ) 12

  17. GraphRNN Variants Objective: � p model ( S π ) over all observed graph sequences Implement f trans as Gated Recurrent Unit (GRU) But different assumptions about p ( S π i | S π < i ) for each variant: 1. Multivariate Bernoulli (GraphRNN-S): f out is a MLP with sigmoid activation that outputs θ i +1 ∈ R i θ i +1 parameterizes the multivariate Bernoulli S π i +1 ∼ P θ i +1 independently 13

  18. GraphRNN Variants Objective: � p model ( S π ) over all observed graph sequences Implement f trans as Gated Recurrent Unit (GRU) But different assumptions about p ( S π i | S π < i ) for each variant: 2. Dependent Bernoulli sequence (GraphRNN): i − 1 p ( S π i | S π � p ( S π i , j | S π i ,< j , S π < i ) = < i ) (7) j =1 • S π i , j ∈ { 0 , 1 } indicating if node π ( v i ) is connected to node π ( v j ) • f out is a edge-level RNN generates the edges of a given node 14

  19. Tractability via Breadth First Search (BFS) Idea: Apply BFS ordering to the graph G with node permutation π before generating the sequence S π Benefits: • Reduce overall # of seq to consider Only need to train on all possible BFS orderings, rather than all possible node permutations • Reduce the number of edge predictions Edge-level RNN only predicts M edges, the maximum size of the BFS queue 15

  20. BFS Order Leads To Fixed Size S π i i ∈ R M represents “sliding window” over nodes in the BFS queue S π Zero-pad all S π i to be a length M vector: S π i = ( A π max(1 , i − M ) , i , ..., A π i − 1 , i ) T , i ∈ { 2 , ..., n } (9) 16

  21. Experiments

  22. Datasets 3 Synthetic and 2 real graph datasets: Dataset Type # Graphs Graph Size Description Community Synthetic 500 60 ≤ � V � ≤ 160 2-community, Erd˝ os-R´ enyimodel (E-R) Grid Synthetic 100 100 ≤ | V | ≤ 400 Standard 2D grid B-A Synthetic 500 100 ≤ | V | ≤ 200 Barab´ asi-Albert model, 4 existing nodes connected Protein Real 918 100 ≤ | V | ≤ 500 Amino acids nodes, edge if ≤ 6 Angstroms apart Ego Real 757 50 ≤ | V | ≤ 399 Document nodes, edges citation re- lationships, from Citeseer 17

  23. Baseline Methods & Settings • Compared GraphRNN to traditional models and deep learning baselines: Method Type Algorithm Erd˝ os-R´ enyiModel (E-R) (Erd¨ os & R´ enyi, 1959) Barab´ asi-Albert Model (B-A) (Albert & Barab´ asi, 2002) Traditional Kronecker graph models (Leskovec et al., 2010) Mixed-membership stochastic block models (MMSB) (Airoldi et al., 2008) GraphVAE (Simonovsky & Komodakis, 2018) Deep learning DeepGMG (Li et al., 2018) • 80%-20% train-test split • All models trained with early stopping • Traditional methods: learn from a single graph, so train a separate model for each training graph in order to compare with these methods • Deep learning baselines: use smaller dataset: Community-small: 12 ≤ | V | ≤ 20 18 Ego-small: 4 ≤ � V � ≤ 18

  24. Evaluating Generated Graph Via MMD Metric Existing: • Visual Inspection • Simple comparisons of average statistics between the two sets Proposed: A metric based on Maximum Mean Discrepancy (MMD) , to compare all moments of their empirical distributions using an exponential kernel with Wasserstein distance. 19

Recommend


More recommend