activenetworks cross condition analysis of functional
play

ActiveNetworks Cross-Condition Analysis of Functional Genomic Data - PowerPoint PPT Presentation

ActiveNetworks Cross-Condition Analysis of Functional Genomic Data T. M. Murali , Deept Kumar , Greg Grothaus , Maulik Shukla , Graham Jack , Jamie Garst , Richard Helm , Malcolm Potts , and Naren Ramakrishnan


  1. ActiveNetworks Cross-Condition Analysis of Functional Genomic Data T. M. Murali † , Deept Kumar † , Greg Grothaus † , Maulik Shukla † , Graham Jack ∗ , Jamie Garst ∗ , Richard Helm ∗ , Malcolm Potts ∗ , and Naren Ramakrishnan † Departments of † Computer Science and ∗ Biochemistry Virginia Tech murali@cs.vt.edu http://people.cs.vt.edu/˜murali DIMACS Workshop on on Detecting and Processing Regularities in High Throughput Biological Data June 21, 2005

  2. Motivation: Manual Systems Biology ◮ Richard Helm and Malcolm Potts study desiccation tolerance in baker’s yeast and human cells. ◮ Measure gene expression and find genes whose expression level change during desiccation and during rehydration.

  3. Motivation: Manual Systems Biology ◮ Richard Helm and Malcolm Potts study desiccation tolerance in baker’s yeast and human cells. ◮ Measure gene expression and find genes whose expression level change during desiccation and during rehydration. ◮ Trace genes by hand through databases of protein-protein interactions, gene regulatory networks, metabolic pathways, PubMed searches to build networks activated in response to these stresses.

  4. Motivation: Manual Systems Biology Redescription R5 A T 2 vs T 1 � -5 AND NOT T 2 vs T 1 � -1 Heat Shock, 30 min � -1 B Redescription R5 Gene List 0.71 ARO4, ASN1, CLN2, GAS3, HEM13, HIS1, IMD4, PHO3, RPL-7A, 7B, 13A, 17B, 27B, 40B, RPS-0B, -7 -5 -3 -1 -7 -5 -3 -1 Heat shock, 9B, 10A, 16B, 22B, 26B, SAH1, SAM1, SUN4, TEF4, T 2 vs. T 1 TPO2, URA7, UTR2, YHB1, YBR238C, YER156C, 30 min YFR055W, YOR309C C Thiamine PHO3 YHB1 oxidative stress response riboswitch ligands? transport HIS4 HIS1 SAM SAM1 SER3 serine biosynthesis fumarate ARO4 TPO2 polyamine transport ARO1 FRDS reductases phospholipid synthesis URA7 OSM1 SAH1 osmotic growth URA8 protein YFR055W SIP18 binding CYS4 LYS14 YBL085W NADPH saccharopine TEF4 lysine � -KG SIR3 LYS12 � -ketoglutarate YBR238C MET30 CLN2 glutamate F-box; protein HMG-CoA acetoacetate acetyl-CoA S0B L40B CDC34 ubiquitination ERG13 S9B L17B L7A GAS1 ribosomal acetyl-CoA cell wall L7B S26B L27B genes GAS3 organization S22B S10A S16B L13A

  5. Motivation: Manual Systems Biology Redescription R5 A T 2 vs T 1 � -5 AND NOT T 2 vs T 1 � -1 Heat Shock, 30 min � -1 B Redescription R5 Gene List 0.71 ARO4, ASN1, CLN2, GAS3, HEM13, HIS1, IMD4, PHO3, RPL-7A, 7B, 13A, 17B, 27B, 40B, RPS-0B, -7 -5 -3 -1 -7 -5 -3 -1 Heat shock, 9B, 10A, 16B, 22B, 26B, SAH1, SAM1, SUN4, TEF4, T 2 vs. T 1 TPO2, URA7, UTR2, YHB1, YBR238C, YER156C, 30 min YFR055W, YOR309C C Thiamine PHO3 YHB1 oxidative stress response riboswitch ligands? transport HIS4 HIS1 SAM SAM1 SER3 serine biosynthesis fumarate ARO4 TPO2 polyamine transport ARO1 FRDS reductases phospholipid synthesis URA7 OSM1 SAH1 osmotic growth URA8 protein YFR055W SIP18 binding CYS4 LYS14 YBL085W NADPH saccharopine TEF4 lysine � -KG SIR3 LYS12 � -ketoglutarate YBR238C MET30 CLN2 glutamate F-box; protein HMG-CoA acetoacetate acetyl-CoA S0B L40B CDC34 ubiquitination ERG13 S9B L17B L7A GAS1 ribosomal acetyl-CoA cell wall L7B S26B L27B genes GAS3 organization S22B S10A S16B L13A Can we automate this process?

  6. Requirements for Automation ◮ Wiring diagram of the cell: protein-protein interactions, metabolic pathways, transcriptional regulatory networks, . . . . ◮ Measurement of molecular profiles (gene expression, protein expression, metabolite levels) under different conditions or cell states. ◮ Algorithms for combining these types of information.

  7. High-throughput Biology Provides Wiring Diagram ◮ Large amounts of information on different types of cellular interactions are now available. ◮ Protein-protein interactions: genome-scale yeast 2-hybrid experiments, in-vivo pulldowns of protein complexes. ◮ Transcriptional regulatory networks: ChIP-on-chip experiments yield protein-DNA binding data. ◮ Metabolic networks: databases culled from the literature (KEGG). ◮ Techniques that extract interactions automatically from abstracts.

  8. S. cerevisiea Wiring Diagram ◮ Physical network ◮ 15,429 protein-protein interactions from the Database of Interacting Proteins (DIP). ◮ 5869 protein-DNA interactions (Lee et al., Science, 2002). ◮ 6,306 metabolic interactions (proteins operate on at least common metabolite) based on KEGG. ◮ Genetic network ◮ 4,125 synthetically lethal/sick interactions (Tong et al., Science, 2004). ◮ 687 synthetically lethal interactions (MIPS). ◮ Overall network has 32,416 (27,604 physical and 4,812 genetic) interactions between 5601 proteins ( Kelley and Ideker, Nature Biotech., 2005 ).

  9. Challenges in Utilising the Wiring Diagram ◮ Networks are large; they contain tens of thousands of interactions. ◮ High-throughput experiments contain many errors. ◮ Networks are incomplete; experiments are expensive and have biases. ◮ A biologist wants to explore and analyse system of interest. ◮ How do we zoom into the appropriate parts of the wiring diagram?

  10. ActiveNetworks ActiveNetwork : network of interactions activated in response to a stress or in a particular condition. 1. Overlay molecular profile for a particular stress on wiring diagram to obtain ActiveNetwork for that stress. 2. Combine computed ActiveNetworks for each stress to find 2.1 ActiveNetwork common to multiple stresses. 2.2 ActiveNetwork unique to a particular stress or group of stresses.

  11. ActiveNetworks ActiveNetwork : network of interactions activated in response to a stress or in a particular condition. 1. Overlay molecular profile for a particular stress on wiring diagram to obtain ActiveNetwork for that stress. 2. Combine computed ActiveNetworks for each stress to find 2.1 ActiveNetwork common to multiple stresses. 2.2 ActiveNetwork unique to a particular stress or group of stresses.

  12. Overlaying Gene Expression Data ◮ Weight of an interaction is the Pearson correlation between the expression profiles of the interacting genes. ◮ Weight ≡ “activity” level of the interaction.

  13. Overlaying Gene Expression Data ◮ Weight of an interaction is the Pearson correlation between the expression profiles of the interacting genes. ◮ Weight ≡ “activity” level of the interaction. ◮ Discard interactions based on a threshold. ◮ Unsatisfactory since we test each interaction individually.

  14. Overlaying Gene Expression Data ◮ Weight of an interaction is the Pearson correlation between the expression profiles of the interacting genes. ◮ Weight ≡ “activity” level of the interaction. ◮ Discard interactions based on a threshold. ◮ Unsatisfactory since we test each interaction individually. ◮ Gene expression data: response to 14 environmental stresses ( Gasch et al., Mol. Bio. Cell 2000 ): Heat shock, oxidative stresses, drug treatments.

  15. Overlaying Heat Shock Gene Expression Data

  16. Overlaying Heat Shock Gene Expression Data

  17. Overlaying Heat Shock Gene Expression Data

  18. Overlaying Heat Shock Gene Expression Data We find the most highly active subnetwork.

  19. Defining Highly-Active Subnetworks ◮ The density of a network with n nodes is the total weight of the edges divided by n . ◮ Problem: Compute the subnetwork with highest density. 0.3 0.1 0.8 0.1 0.4 0.5 0.5 0.9 0.5 0.7

  20. Computing Most Dense Subnetwork ◮ O ( n 3 ) time network flow-based approach gives optimal result ( Gallo, Grigoriadis, Tarjan, SIAM J. Comp, 1989 ). ◮ Can also be solved by linear programming.

  21. Computing Most Dense Subnetwork ◮ Greedy algorithm: ◮ Weight of a node ≡ total weight of incident edges. ◮ Repeatedly delete nodes with the smallest weight. ◮ Keep track of density of remaining network. ◮ Return the most dense subnetwork. 0.3 0.1 0.8 0.1 0.4 0.5 0.5 0.9 0.5 0.7

  22. Computing Most Dense Subnetwork ◮ Greedy algorithm: ◮ Weight of a node ≡ total weight of incident edges. ◮ Repeatedly delete nodes with the smallest weight. ◮ Keep track of density of remaining network. ◮ Return the most dense subnetwork. 0.3 0.1 0.8 0.1 0.4 0.5 0.5 0.9 0.5 0.7

  23. Computing Most Dense Subnetwork ◮ Greedy algorithm: ◮ Weight of a node ≡ total weight of incident edges. ◮ Repeatedly delete nodes with the smallest weight. ◮ Keep track of density of remaining network. ◮ Return the most dense subnetwork. ◮ Computed subnetwork is at least half as dense as the most dense subnetwork ( Charikar, Proc. APPROX, 2000 ). 0.3 0.1 0.8 0.1 0.4 0.5 0.5 0.9 0.5 0.7

  24. Computing Multiple Dense Subnetworks ◮ Repeat 1. Apply greedy algorithm to compute most dense subnetwork. 2. Remove edges of computed subnetwork from the network. ◮ Until remaining network has density less than the original network. ◮ Output is a sequence of decreasingly dense subnetworks that can share nodes but not edges.

  25. Advantages of Dense Subnetworks ◮ Uses no parameters. ◮ Avoid inclusion of interactions that appear active due to noise. ◮ Relatively weakly correlated interactions can reinforce each other.

  26. Further Analysis of an ActiveNetwork ◮ Visualise the network (Graphviz package) and the gene expression profiles. ◮ Measure functional enrichment. ◮ Use hypergeometric distribution to calculate the significance of functions enriched in an ActiveNetwork . ◮ Use Bonferroni correction to adjust for testing multiple hypotheses.

  27. Heat Shock ActiveNetwork

Recommend


More recommend