Scalable Deep Generative Modeling for Sparse Graphs Hanjun Dai 1 , - PowerPoint PPT Presentation

Scalable Deep Generative Modeling for Sparse Graphs Hanjun Dai 1 , Azade Nazi 1 , Yujia Li 2 , Bo Dai 1 , Dale Schuurmans 1 1 Google Brain, 2 DeepMind

Graph generative models Given a set of graphs {G 1 , G 2 , …, G N }, fit a probabilistic model p(G) over graphs. So that we can: sample from it to get new graphs: G ~ p(G) ● complete a graph given parts: G rest ~ p(G rest | G part ) ● obtain graph representations ● Can also be used for structured prediction p(G|z).

Types of deep graph generative models Leverage the sparse Modeling adjacency structure of graphs matrix directly ⇒ like an image 1 2 4 3 Junction-Tree VAE (Jin et al., 18) VAE (Kipf et al 16, NetGAN (Bojchevski et al., 18) Simonovsky et al. 18) Autoregressive (You et al., 18; Deep GraphGen (Li et al., 18) Liao et al., 19)

Autoregressive graph generative models Time complexity per graph during inference: Model Complexity (n nodes, m edges) Scalability Deep GraphGen (Li et al., 18) O((m + n) 2 ) ~100 nodes GraphRNN (You et al., 18) O(n 2 ) ~2,000 nodes GRAN (Liao et al., 19) O(n 2 ) ~5,000 nodes BiGG (Dai et al., 20) O((m + n) log n) ~100,000 nodes This work Or O(n 2 ) for fully connected graph

Autoregressive graph generative models Time/memory complexity per graph during training: Model # syncs during training memory cost Deep GraphGen (Li et al., 18) O(m) O(m(m+n)) GraphRNN (You et al., 18) O(n 2 ) or O(n) O(n 2 ) GRAN (Liao et al., 19) O(n) O(n(m+n)) BiGG (Dai et al., 20) O(log n) O( ! log % ) This work

Saving computation for sparse graphs GraphRNN ... ... O(n 2 ) GRAN, GraphRNN-S O(n 2 ) BiGG (this work) ... ... O((m + n) log n)

Autoregressive Generation of adjacency matrix 01 Generating one cell 02 Generating one row 03 Generating rows

O(log n) procedure for generating one edge Naive approach: Efficient approach: Given node u , Recursively divide the range [1, n] choose a neighbor v . into two halves, choose one. Choose 1 out of n using a softmax O(log n) decisions maximum O(n)

Binary tree generation Following a path from root Generating neighbors Generating via DFS separately? O(N u log n) O(|T|) O(log n) N u is the number of |T| is the tree size. neighbors of node u |T| < min{N u log n, 2n}

Autoregressive row-binary tree generation For node t , we first decide whether to def generate_tree(t): generate left child. should generate left child? t

Autoregressive row-binary tree generation For node t , we first decide whether to def generate_tree(t): h top (t) generate left child. should generate left child? t ⇒ Generate left child: Conditioning on h top (t) , which summarizes existing tree (from top-down) Has-left ~ Bernoulli( ∘ | h top (t))

Autoregressive row-binary tree generation For node t , we first decide whether to def generate_tree(t): h top (t) generate left child. should generate left child? t if yes: ⇒ Generate left child: create left child generate_tree(lch(t)) Conditioning on h top (t) , which summarizes existing tree (from top-down) Has-left ~ Bernoulli( ∘ | h top (t)) Yes? ⇒ Recursively generate left subtree

Autoregressive row-binary tree generation For node t , we first decide whether to def generate_tree(t): h top (t) generate left child. should generate left child? t if yes: ⇒ Generate left child: create left child generate_tree(lch(t)) Conditioning on h top (t) , which summarizes should generate right child? existing tree (from top-down) if yes: Has-left ~ Bernoulli( ∘ | h top (t)) create right child generate_tree(rch(t)) h bot (lch(t)) Yes? ⇒ Recursively generate left subtree ⇒ Generate right child: Conditioning on h top (t), and h bot (lch(t)) , which summarizes the left subtree of t (from bottom-up) Has-right ~ Bernoulli( ∘ | h top (t), h bot (lch(t))) Yes? ⇒ Recursively generate right subtree

Realize top-down and bottom-up recursion h bot (t) h bot (t) = TreeLSTMCell( , ) h bot (lch(t)) h bot (rch(t)) h top (t) h top (lch(t)) = LSTMCell( , ! "#$% ) ĥ top (rch(t)) = TreeLSTMCell( , ) h top (lch(t)) ĥ top (rch(t)) h top (rch(t)) = LSTMCell( , ! &'()% ) h bot (lch(t)) h top (rch(t))

Autoregressive conditioning between rows To generate neighbors of node u , (i.e., u -th row) How to summarize row 0 to row u-1 ? Use LSTM? – not efficient h row (0) h row (1) h row (0) …… …… h row (1) h row (2) h row (u) h row (u-1) O(n) dependency length

Fenwick tree for prefix summarization Fenwick tree: data structure that supports prefix sum and single modification h row (0) h row (1) h row (2) h row (3) h row (4) h row (5)

Fenwick tree for prefix summarization Fenwick tree: data structure that supports prefix sum and single modification Obtaining “prefix sum” using low-bit query h row (0) Current row u Required Context h row (1) h row (2) u = 3 h row (2) u = 5 h row (4) h row (3) u = 6 h row (4) h row (5) At most O(log n) dependencies per row

Optimizing BiGG 01 Training with O(log n) synchronizations 02 Model parallelism & sublinear memory cost

Training with O(log n) synchronizations Sync 1 …… Sync 2 Stage 1: …… Sync 3 Compute all bottom-up Sync 4 summarizations for all rows 1 1 1 1 O(log n) steps 1 1 1

Training with O(log n) synchronizations Sync 1 Sync 2 Stage 2: Construct Sync 3 entire Fenwick Tree 1 1 1 1 O(log n) steps 1 1 1

Training with O(log n) synchronizations Sync 1 #$% ℎ " Sync 2 Stage 3: #$% ℎ & Retrieve all the prefix #$% ℎ ' context O(log n) steps #$% ℎ ( #$% ℎ )

Training with O(log n) synchronizations …… Sync 1 Sync 2 Stage 4: Sync 3 …… Compute Sync 4 Cross-Entropy O(log n) steps 1 1 1 1 1 1 1

Optimizing BiGG 01 Training with O(log n) synchronizations 02 Model parallelism & sublinear memory cost

Model parallelism

Model parallelism GPU 2 GPU 1

Model parallelism GPU 1 -> 2 message GPU 2 GPU 1

Sublinear memory cost Run 2x forward + 1x backward Memory cost during training: O( ! log % ) pass-1 to pass-2 Pass-2 Pass-1

Experiments

Inference speed

Training memory

Training time Main reason: # GPU cores is limited

Sample quality on benchmark datasets

Sample quality as graph size grows

Summary Advantages: Improve inference speed to O(min{ (m + n) log n, n 2 } ) ● Enables parallelized training with sublinear memory cost ● Did not sacrifice the sample quality ● Limitations: Limited by the parallelism of existing hardware ● Good capacity, but limited extrapolation ability ●

Thank You Hanjun Dai Research Scientist hadai@google.com

Scalable Deep Generative Modeling for Sparse Graphs Hanjun Dai 1 , - PowerPoint PPT Presentation

Scalable Deep Generative Modeling for Sparse Graphs Hanjun Dai 1 , Azade Nazi 1 , Yujia Li 2 , Bo Dai 1 , Dale Schuurmans 1 1 Google Brain, 2 DeepMind Graph generative models Given a set of graphs {G 1 , G 2 , , G N }, fit a probabilistic model

generative design systems Generative Brief Design Definitions Workshop Processes

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Generative Deep Learning Prof. Kuan-Ting Lai 2020/5/12 DeepFake (Intro) Generative Recurrent

Dimensionality Reduction Based on Geodesic Distance Hao Li,515030910494 Yifan Shen,515030910491

Impact of Time-Triggered Transmission Window Placement on Rate-Constrained Traffic in TTEthernet

PANEL CONVERSATION: Exploring the Relationship Between Equity Option Implied Volatility and

st t t r

Semidirect Product Fell Bundles L. Hall, S. Kaliszewski, J. Quigg, D. Williams H*KQW

Virtualized ICN (vICN) Towards a Unified Network Virtualization Framework for ICN Experimentation

In Search of Extra Dimensions Hooman Davoudiasl Brookhaven National Laboratory Pheno 10 May

A proposal for (0 , 2) mirror symmetry of toric varieties Wei Gu Based on Wei Gu , Eric Sharpe

Scalable Deep Generative Modeling for Sparse Graphs Hanjun Dai 1 , - PowerPoint PPT Presentation

Scalable Deep Generative Modeling for Sparse Graphs Hanjun Dai 1 , Azade Nazi 1 , Yujia Li 2 , Bo Dai 1 , Dale Schuurmans 1 1 Google Brain, 2 DeepMind Graph generative models Given a set of graphs {G 1 , G 2 , , G N }, fit a probabilistic model

generative design systems Generative Brief Design Definitions Workshop Processes

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Generative Deep Learning Prof. Kuan-Ting Lai 2020/5/12 DeepFake (Intro) Generative Recurrent

Dimensionality Reduction Based on Geodesic Distance Hao Li,515030910494 Yifan Shen,515030910491

Impact of Time-Triggered Transmission Window Placement on Rate-Constrained Traffic in TTEthernet

PANEL CONVERSATION: Exploring the Relationship Between Equity Option Implied Volatility and

st t t r

Semidirect Product Fell Bundles L. Hall*, S. Kaliszewski, J. Quigg, D. Williams *H*KQW

Virtualized ICN (vICN) Towards a Unified Network Virtualization Framework for ICN Experimentation

In Search of Extra Dimensions Hooman Davoudiasl Brookhaven National Laboratory Pheno 10 May

A proposal for (0 , 2) mirror symmetry of toric varieties Wei Gu Based on Wei Gu , Eric Sharpe

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Semidirect Product Fell Bundles L. Hall, S. Kaliszewski, J. Quigg, D. Williams H*KQW