Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec - PowerPoint PPT Presentation

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 1 2018

This Tutorial snap.stanford.edu/deepnetbio-ismb ISMB 2018 July 6, 2018, 2:00 pm - 6:00 pm Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 2

This Tutorial 1) Node embeddings § Map nodes to low-dimensional embeddings § Applications: PPIs, Disease pathways 2) Graph neural networks § Deep learning approaches for graphs § Applications: Gene functions 3) Heterogeneous networks § Embedding heterogeneous networks § Applications: Human tissues, Drug side effects Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 3

Part 1: Node Embeddings Some materials adapted from: Hamilton et al. 2018. Representation Learning on • Networks. WWW. Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 4

Embedding Nodes Input Output Intuition: Map nodes to d-dimensional embeddings such that similar nodes in the graph are embedded close together Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 5

Setup § Assume we have a graph G : § V is the vertex set § A is the adjacency matrix (assume binary) § No node features or extra information is used! Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 6

Embedding Nodes Goal: Map nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the network d-dimensional Input network embedding space Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 7

Embedding Nodes Goal: similarity( u, v ) ≈ z > v z u Need to define! d-dimensional Input network embedding space Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 8

Learning Node Embeddings 1. Define an encoder (a function ENC that maps node 𝑣 to embedding 𝒜 ) ) 2. Define a node similarity function (a measure of similarity in the input network) 3. Optimize parameters of the encoder so that: similarity( u, v ) ≈ z > v z u Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 9

Two Key Components 1. Encoder maps a node to a d-dimensional vector: d -dimensional embedding enc ( v ) = z v node in the input graph 2. Similarity function defines how relationships in the input network map to relationships in the embedding space: similarity( u, v ) ≈ z > v z u Similarity of u and v in dot product between node the network embeddings Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 10

Embedding Methods § Many methods use similar encoders: § node2vec, DeepWalk, LINE, struc2vec § These methods use different notions of node similarity: § Two nodes have similar embeddings if: § they are connected? § they share many neighbors? § they have similar local network structure? § etc. Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 11

Outline of This Section 1. Adjacency-based similarity 2. Random walk approaches 3. Biomedical applications Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 12

Adjacency-based Similarity Material based on: Ahmed et al. 2013. Distributed Natural Large Scale Graph Factorization. • WWW. Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 13

Adjacency-based Similarity § Similarity function is the edge weight between u and v in the network § Intuition: Dot products between node embeddings approximate edge existence X u z v � A u,v k 2 k z > L = ( u,v ) 2 V ⇥ V embedding loss (what we (weighted) similarity want to minimize) sum over all adjacency matrix node pairs for the graph Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 14

Adjacency-based Similarity X u z v � A u,v k 2 k z > L = ( u,v ) 2 V ⇥ V § Find embedding matrix 𝐚 ∈ ℝ 0 2 |4| that minimizes the loss ℒ : Option 1: Stochastic gradient descent (SGD) § Highly scalable, general approach § Option 2: Solve matrix decomposition solvers § e.g., SVD or QR decompositions § Need to derive specialized solvers § Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 15

Adjacency-based Similarity § O(|V| 2 ) runtime § Must consider all node pairs § O([E|) if summing over non-zero edges (e.g., Natarajan et al., 2014) § O(|V|) parameters § One learned embedding per node § Only consider direct connections Red nodes are obviously more similar to Green nodes compared to Orange nodes , despite none being directly connected Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 16

Outline of This Section 1. Adjacency-based similarity 2. Random walk approaches 3. Biomedical applications Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 17

Random Walk Approaches Material based on: • Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. • Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. • Ribeiro et al. 2017. struc2vec: Learning Node Representations from Structural Identity. KDD. Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 18

Multi-Hop Similarity Idea: Define node similarity function based on higher-order neighborhoods § Red: Target node § k=1: 1-hop neighbors A A (i.e., adjacency matrix) § § k= 2: 2-hop neighbors § k=3: 3-hop neighbors How to stochastically define these higher-order neighborhoods? Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 19

Unsupervised Feature Learning § Intuition: Find embedding of nodes to 𝑒 -dimensions that preserves similarity § Idea: Learn node embedding such that nearby nodes are close together § Given a node 𝑣 , how do we define nearby nodes? § 𝑂 = 𝑣 … neighbourhood of 𝑣 obtained by some strategy 𝑆 Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 20

� Feature Learning as Optimization § Given 𝐻 = (𝑊, 𝐹) § Goal is to learn 𝑔: 𝑣 → ℝ 0 § where 𝑔 is a table lookup § We directly “learn” coordinates 𝒜 𝒗 = 𝑔 𝑣 of 𝑣 § Given node 𝑣 , we want to learn feature representation 𝑔(𝑣) that is predictive of nodes in 𝑣 ’s neighborhood 𝑂 H (𝑣) max M log Pr(𝑂 H (𝑣)| 𝒜 S ) L ) ∈4 Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 21

Unsupervised Feature Learning Goal: Find embedding 𝒜 ) that predicts nearby nodes 𝑂 = 𝑣 : X log( P ( N R ( u ) | z u )) v ∈ V Assume conditional likelihood factorizes: Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 22

Random-walk Embeddings Probability that u z > and v co-occur in a u z v ≈ random walk over the network Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 23

Why Random Walks? 1. Flexibility: Stochastic definition of node similarity: Local and higher-order neighborhoods § 2. Efficiency: Do not need to consider all node pairs when training Consider only node pairs that co-occur § in random walks Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 24

Random Walk Optimization 1. Simulate many short random walks starting from each node using a strategy R 2. For each node u, get N R ( u ) as a sequence of nodes visited by random walks starting at u 3. For each node u, learn its embedding by predicting which nodes are in N R ( u ): X X L = − log( P ( v | z u )) u ∈ V v ∈ N R ( u ) Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 25

Random Walk Optimization exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) sum over nodes v predicted probability of u sum over all and v co-occuring on seen on random nodes u walks starting from u random walk, i.e., use softmax to parameterize 𝑄(𝑤|𝒜 ) ) Random walk embeddings = 𝒜 ) minimizing L Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 26

Random Walk Optimization But doing this naively is too expensive! exp( z > ✓ u z v ) ◆ X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) Nested sum over nodes gives O(|V| 2 ) complexity! The problem is normalization term in the softmax function? Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 27

Solution: Negative Sampling Solution: Negative sampling (Mikolov et al., 2013) exp( z > ✓ ◆ u z v ) log P n 2 V exp( z > u z n ) k X ≈ log( σ ( z > log( σ ( z > u z v )) − u z n i )) , n i ∼ P V i =1 random distribution sigmoid function over all nodes i.e., instead of normalizing w.r.t. all nodes, just normalize against k random negative samples Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 28

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec - PowerPoint PPT Presentation

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 1 2018 This Tutorial snap.stanford.edu/deepnetbio-ismb ISMB 2018 July 6,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 Rouslan MORO 1 , 2 Dorothea

ACCT 420: Logistic Regression for Bankruptcy Session 5 Dr. Richard M. Crowley 1 Front matter

Market Timing Is ... Mark Pankin MDP Associates LLC Registered Investment Advisor November 15,

Third Quarter Results 2007 Zurich November 1, 2007 Renato Fassbind Chief Financial Officer

Introduction to Tournaments Tournament Stphane Airiau Input: Binary relation between

1 Introduction phenomenon Introduction phenomenon Methods & results background Methods

CS-5630 / CS-6630 Visualization for Data Science Filtering & Aggregation Alexander Lex

Ramsey theorems for classes of structures with functions and relations Jan Hubi cka

Sambuz

Useful Links

Newsletter

Mail Us

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec - PowerPoint PPT Presentation

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 1 2018 This Tutorial snap.stanford.edu/deepnetbio-ismb ISMB 2018 July 6,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 Rouslan MORO 1 , 2 Dorothea

ACCT 420: Logistic Regression for Bankruptcy Session 5 Dr. Richard M. Crowley 1 Front matter

Market Timing Is ... Mark Pankin MDP Associates LLC Registered Investment Advisor November 15,

Third Quarter Results 2007 Zurich November 1, 2007 Renato Fassbind Chief Financial Officer

Introduction to Tournaments Tournament Stphane Airiau Input: Binary relation between

1 Introduction phenomenon Introduction phenomenon Methods &amp; results background Methods

CS-5630 / CS-6630 Visualization for Data Science Filtering &amp; Aggregation Alexander Lex

Ramsey theorems for classes of structures with functions and relations Jan Hubi cka

Sambuz

Useful Links

Newsletter

Mail Us

1 Introduction phenomenon Introduction phenomenon Methods & results background Methods

CS-5630 / CS-6630 Visualization for Data Science Filtering & Aggregation Alexander Lex