Uncovering Functions Through Multi-Layer Tissue Networks Marinka Zitnik marinka@cs.stanford.edu Joint work with Jure Leskovec
Network biomedicine Networks are a general language for describing and modeling biological systems, their structure, functions and dynamics Marinka Zitnik, Stanford 2
Why Protein Functions? § Protein functions important for: § Understanding life at the molecular level § Biomedicine and pharmaceutical industry § Biotechnological limits & rapid growth of sequence data: most proteins can only be annotated computationally [Clark et al. 2013, Rost et al. 2016, Greene et al. 2016] Marinka Zitnik, Stanford 3
What Does My Protein Do? Goal: Given a set of proteins and possible functions, we want to predict each protein’s association with each function: antn: Proteins × Functions → [0,1] antn: CDC3 × Cell cycle → 0.9 antn: RPT6 × Cell cycle → 0.05 Marinka Zitnik, Stanford 4
Existing Research “Guilty by association”: Cell cycle protein’s function is determined based on who it interacts with ? § Approaches Cell § Neighbor scoring proliferation § Indirect scoring § Random walks [Zuberi et al. 2013, Radivojac et al. 2013, Kramer et al. 2014, Yu et al. 2015] and many others 5 Marinka Zitnik, Stanford
Existing Research § Protein functions are assumed constant across organs and tissues: § Functions in heart are the same as in skin § Functions in frontal lobe are the same as in whole brain Lack of methods to predict functions in different biological contexts Marinka Zitnik, Stanford 6
Questions for Today 1. How can we describe and model multi- layer tissue networks? 2. Can we predict protein functions in given context [e.g., tissue, organ, cell system]? 3. How functions vary across contexts? Marinka Zitnik, Stanford 7
Biotechnological Challenges § Tissues have inherently multiscale, hierarchical organization § Tissues are related to each other: § Proteins in biologically similar tissues have similar functions [Greene et al. 2016, ENCODE 2016] § Proteins are missing in some tissues § Interaction networks are tissue-specific § Many tissues have no annotations Marinka Zitnik, Stanford 8
Computational Challenges § Multi-layer network theory is only emerging at present § Lack of formulations accounting for: § multiple interaction types § interactions vary in space, time, scale § interconnected networks of networks § Nodes have different roles across layers § Labels are extremely sparse Marinka Zitnik, Stanford 9
The multi-layer Part 1 nature of networks In biomedicine Marinka Zitnik, Stanford 10
� ������� � �������� � ���� � �� � �������� � � � ��� � ������ � ��������� � � ���� � ��������� ���������� ��������� ���������� �������� ��������� � ������ � ��� � ������� � � � �� � ���� � �������� � � ���� � Multi-Layer Networks § Collections of interdependent networks § Different layers have different meanings � G 1 G 2 � � � G 3 G 4 5 B 4 6 1 4 2 A 5 1 2 Y 1 3 2 X Marinka Zitnik, Stanford 11
Many Network Layers § Many networks are inherently multi- layer but the layers are: § Modeled independently of each other § Collapsed into one aggregated network § The models must be: § Multi-scale: Layers at different levels of granularity § Scalable: Tens or hundreds of layers Marinka Zitnik, Stanford 12
Example: Tissue Networks § Separate protein-protein interaction network for each tissue § Biological similarities between tissues at multiple scales G 1 G 2 G 3 G 4 Marinka Zitnik, Stanford 13
Example: Tissue Networks § Each PPI network is a layer 𝐻 B = (𝑊 B , 𝐹 B ) § Similarities between layers are given in hierarchy ℳ , map 𝜌 encodes parent-child relationships G 1 G 2 G 3 G 4 Marinka Zitnik, Stanford 14
Neural embeddings Part 2 for multi-layer networks Marinka Zitnik, Stanford 15
Machine Learning in Networks Cell CDC3 CDC3 cycle CLB4 CLB4 CDC16 CDC16 UNK1 UNK1 Cell Machine proliferation Learning RPT1 RPT1 RPN3 RPN3 RPT6 RPT6 UNK2 UNK2 Function prediction: Multi-label node classification Marinka Zitnik, Stanford 16
Machine Learning Lifecycle § Machine Learning Lifecycle: This feature, that feature § Every single time! Raw Node and edge Learning Prediction Networks profiles Algorithm Model Automatically Feature Downstream Engineering prediction of protein functions learn the features Marinka Zitnik, Stanford 17
Feature Learning in Graphs Efficient task-independent feature learning for machine learning in networks vector Node 𝑣 𝑔: 𝑣 → ℝ M N ℝ M Feature representation, embedding Marinka Zitnik, Stanford 18
Feature Learning in Multi-Layer Nets vectors for 𝑣 Node 𝑣 Node 𝑣 𝑔 B , 𝑔 O , 𝑔 P Node 𝑣 𝑔 Q , 𝑔 R , 𝑔 S 𝑣 → ℝ M N ℝ M Multi-layer, multi-scale embedding Marinka Zitnik, Stanford 19
Features in Multi-Layer Network § Given: Layers 𝐻 B B , hierarchy ℳ § Layers 𝐻 B BTS..U are in leaves of ℳ B → ℝ M § Goal: Learn functions: 𝑔 B : 𝑊 § Multi-scale model: § 𝑔 B are in leaves of ℳ § 𝑔 V are internal elements of ℳ Marinka Zitnik, Stanford 20
Features in Multi-Layer Network § Approach has two components: 1. Single-layer objectives: nodes with similar neighborhoods in each layer are embedded close together 2. Hierarchical dependency objectives: nodes in nearby layers are encouraged to share similar features Marinka Zitnik, Stanford 21
Single-Layer Objectives § Intuition: For each layer, embed nodes to 𝑒 dimensions by preserving their similarity § Approach: Nodes 𝑣 and 𝑤 are similar if their network neighborhoods are similar § Given node 𝑣 in layer 𝑗 we define nearby nodes 𝑂 B (𝑣) based on random walks starting at node 𝑣 Layer 𝑗 N u u [Grover et al. 2016] Marinka Zitnik, Stanford 22
Single-Layer Objectives § Given node 𝑣 in layer 𝑗 , learn 𝑣 ’s representation such that it predicts nearby nodes 𝑂 B (𝑣) : ω i ( u ) = log Pr ( N i ( u ) | f i ( u )) , § Given 𝑈 layers, maximize: X for Ω i = ω i ( u ) , for i = 1 , 2 , . . . , T. u ∈ V i Marinka Zitnik, Stanford 23
Interdependent Layers § So far, we did not consider hierarchy ℳ § Node representations in different layers are learned independently of each other How to model dependencies between layers when learning features? Marinka Zitnik, Stanford 24
Idea: Interdependent Layers § Encourage nodes in layers nearby in the hierarchy to be embedded close together Marinka Zitnik, Stanford 25
Relationships Between Layers § Hierarchy 𝑁 is a tree, given by the M parent-child relationships: by π : M → M, of objects venience, let is parent of 𝑗 in 𝑁 § where π ( i ) denote the set Example: “2” is parent of 𝐻 B , 𝐻 ] Marinka Zitnik, Stanford 26
Interdependent Layers § Given node 𝑣 , learn 𝑣 ’s representation in layer 𝑗 to be close to 𝑣 ’s representation in parent 𝜌(𝑗) : c i ( u ) = 1 2 k f i ( u ) � f π ( i ) ( u ) k 2 2 . § Multi-scale: Repeat at every level of ℳ X C i = c i ( u ) , u ∈ L i 𝑀 B has all layers appearing in sub-hierarchy rooted at 𝑗 Marinka Zitnik, Stanford 27
Final Model: OhmNet Automatic feature learning in multi-layer networks Solve maximum likelihood problem: X X max Ω i � λ C j , f 1 ,f 2 ,...,f | M | i ∈ T j ∈ M Single-layer Hierarchical objectives dependency objectives Marinka Zitnik, Stanford 28
OhmNet Algorithm 1.For each layer, compute random walk probs. 2.For each layer, sample fixed-length random walks starting from each node 𝑣 3.Optimize the OhmNet objective using stochastic gradient descent Scalable: No pairwise comparison of nodes from different layers Marinka Zitnik, Stanford 29
Results: Protein Part 3 function prediction across tissues Marinka Zitnik, Stanford 30
Tissue-Specific Function Prediction 1. Learn features of every node and at every scale based on: Edges within each layer § Inter-layer relationships between nodes active § on different layers 2. Predict tissue-specific protein functions using the learned node features Marinka Zitnik, Stanford 31
Protein Functions and Tissues Marinka Zitnik, Stanford 32
Recommend
More recommend