Reconstructing Signaling Pathways with Probabilistic Boolean Threshold Networks Lars Kaderali ViroQuant Research Group „Modeling“ University of Heidelberg
ViroQuant – Systems Biology of Virus ‐ Host Interactions • Viruses rely on many host factors for cell entry, replication within the host cell, and spread • RNAi knock ‐ downs of host genes can help identify these factors: – RNAi knockdown of genes in infected cells Human T ‐ cell lymphotropic virus with host cell – Observe whether virus Encyclopaedia Britannica Online, 2007 can still replicate
A Pipeline for the Analysis of RNAi Screens HCV nuclei neg. control CD81 Rieber, Knapp, Eils, Kaderali (2009): RNAither, an automated pipeline for the statistical analysis of high ‐ throughput RNAi screens . Bioinformatics 25, 678 ‐ 679. Börner et al. (2010 ): From experimental setup to bioinformatics: An RNAi screening platform to identify host factors and potential cellular networks involved in HIV ‐ 1 replication , Biotechnology Journal, 5(1), 39 ‐ 49.
Network Inference from RNAi Data • RNAi knockdowns are well suited to identify genes, that are important for specific phenotypic traits of interest. • The temporal and spatial placement of these genes in signal transduction pathways remains a huge challenge. • Network Inference is the process of reconstructing such pathways from the experimental data. 1 1 2 3 1 Gene Observed Phenotypic 2 3 4 Knockdown Effect 2 3 4 Gene 1 Strong Effect R 4 Gene 2 No Effect R Gene 3 Weak Effect ? R Gene 4 Strong Effect
Network Inference from RNAi Data • Experimental data differ in available readouts Want general method that will run with missing observations, • but improves when more data are available! Gene Observation Observation Observation Observation Knockd Gene 1 at Gene 2 at Gene 3 at Gene 4 at own timepoint 1 timepoint 1 timepoint 1 timepoint 1 Gene 1 Active Active Inactive Inactive Gene 2 Inactive Inactive Inactive Active Gene 3 Inactive Active Active Active Gene 4 Active Inactive Active Active
Network Inference from RNAi Data • Experimental data differ in available readouts Want general method that will run with missing observations, • but improves when more data are available! Gene Observation Observation Observation Observation Knockd Gene 1 at Gene 2 at Gene 3 at Gene 4at Gene Observation Observation Observation Observation own timepoint 1 timepoint 1 timepoint 1 timepoint 1 Knockd Gene 1 at Gene 2 at Gene 3 at Gene 4 at own timepoint 2 timepoint 2 timepoint 2 timepoint 2 Gene 1 Active Active Inactive Inactive Gene 1 Active Inactive Active Inactive Gene 2 Inactive Inactive Inactive Active Gene 2 Active Inactive Inactive Inactive Gene 3 Inactive Active Active Active Gene 3 Active Active Inactive Active Gene 4 Active Inactive Active Active Gene 4 Active Inactive Active Inactive
Complexity of Network Inference • For n genes, there are n² Number of Topologies 10 x different possible edges between two genes. • In a given network, each of these n² edges is present or absent • This yields a total of 2 n*n possible, different Network size n network topologies • How much data is n # Topologies required to decide which 2 16 3 512 is the true topology? 4 65.536 5 33.554.432 10 1.267.650.600.228.229.401.496.703.205.376
Iterative Network Reconstruction 1 Regularization! 1 1 2 3 2 3 2 3 4 4 4 R R Experiment p=0.3 p=0.6 p=0.1 R Candidate Models Experiment Planning
Mathematical Model • Bayesian Network Model » Each node is either „active“ (1) or „inactive“ (0) » State of node at time t depends stochastically on states of „parents“ at time t ‐ 1 1 2 3 4 p(x=1) R p(x=0)
State Transition Matrix • For a system with n nodes, there are 2 n possible states. • If in state i at time t , we can compute the probability of being in state j at time t+1 • Hence, we can calculate the state transition matrix as
State Transition Probabilities If p is a 2 n Row ‐ Vector giving 1 • the probability distribution 2 3 over the initial states, then 4 p M R is the column Vector giving the distribtion after 1 timestep. p=0.4 p=0.3 • Similarly, p M τ 1 1 gives the distribution after τ timesteps. 2 3 2 3 4 4 R R
Integration of Knockdowns 1 1 X 2 3 3 4 4 R R • Knockouts can be taken into account simply by „taking out“ the corresponding gene from the model. • In terms of M, this amounts to removing rows where the knockout gene is active, and summing up the corresponding columns.
Stochastic Model: Likelihood • Assume we have an initial state distribution p 0 . • Given model Parameters θ =(w, w0, T), the likelihood of seeing a particular set of experimental outcomes D after knockdown experiments is Gene Observed Phenotypic 1 Knockdown Effect Gene 1 Strong Effect 2 3 Gene 2 No Effect 4 Gene 3 Weak Effect Gene 4 Strong Effect R
Likelihood Approximation • We cannot compute an exact likelihood p(D| θ ) for „larger“ networks, because M is growing exponentially. • BUT we can use the stochastic model to simulate data, and compare the simulated data with the measured data! • We then approximate the likelihood by the percentage of trials where we are getting the observed data back: • This is of particular usefulness since it automatically takes into account the marginalization over unobserved nodes.
Prior Distribution Parameters w in model correspond to strength of interaction between two genes / proteins. Expect network to be sparse , i.e. most pathway components should have NO interaction between them. ⎡ ⎤ q w = ⎢ − ⎥ p w N ( ) exp q qs ⎢ ⎥ ⎣ ⎦ Ritter et al., submitted
Sampling from the Posterior Combines Metropolis Hasting algorithm with simulation approximation of the likelihood. Marjoram et al, PNAS (2003) We furthermore integrated Mode Hopping steps Senderowitz (1995)
Alternative: Distributed Evolutionary Monte Carlo (DGMC) Combining genetic algorithm and Markov chain Monte Carlo ➲ A population of N=mk Markov chains are divided equally into k subpopulations ➲ Genetic operators, mutation, cross over, migration are used to generate next generation in each chain in each subpopulation
Extension: Multiple Time Points 1 1 1 2 3 2 3 2 3 4 4 4 Delta_T_1 Delta_T_2 R R R time • Experimental measurements at different time points, but „real time“ is continuous! • Model requires discrete time steps • How many „model time“ steps between two experimental measurements? � Sample additional parameter Delta_T!
Application: Jak ‐ Stat Signaling • Experimental Data: Eva Dazert (Dept. of Virology) • Huh ‐ 7 cell lines • Knockdown of all genes in the pathway, stimulation with IFN α and IFN γ • Signal: HCV Replication
Jak / Stat Signaling Kaderali et al., Bioinformatics, 2009
Summary • Method to reconstruct signal transduction networks from RNAi phenotypes based on Bayesian networks Approximation of likelihood using stochastic simulation • • Regularization to Sparse Networks using Prior Distribution • Sampling from posterior allows computation of distributions over alternative topologies and parameters. – Important application in experiment design – Cost efficient method to reconstruct networks from data • Application to Jak/Stat data shows core topology can be reconstructed even from single downstream readouts. • Multiple readouts, time series data, ... easily integrated
Acknowledgements Molecular Virology: Viroquant Modeling: Dept. of Virology: Eva Dazert Narsis Kiani Johannes Hermle Ulf Zeuge Bettina Knapp Kathleen Boerner Michael Frese Matthias Boeck Maik Lehmann Ilka Wörz Nora Rieber Oliver Keppler Anil Kumar Johanna Mazur Silvia Geuenich Andreas Merz Daniel Ritter Hans ‐ Georg Kräusslich Marion Pönisch Nurgazy Sulaimanov Viroquant NWG Marco Binder Gajendra Suryavanshi „Screening“: Alessia Rugieri Samta Malhotra Vytaute Starkuviene Wolfgang Fischl Sandeep Amberkar Oliver Wicht Cindy Nürnberger TBI Bioinformatics, DKFZ: Ralf Bartenschlager Natalia Drost Petr Matula Thorsten Stumpf Karl Rohr EMBL / Bioquant: Roland Eils Nigel Brown Viroquant Screening Unit: Reinhard Schneider Holger Erfle Tokyo Medical University: Soichi Ogishima Inst. f. Scientific Computing: Christoph Sommer Fred Hamprecht Julian Kunkel Gerhard Reinelt
Thank you for your attention! Lars Kaderali Viroquant Research Group Modeling Bioquant, University of Heidelberg lars.kaderali@bioquant.uni ‐ heidelberg.de
Identifiability • If only downstream readouts at steady state are available, some topological features cannot be reconstructed!
Identifiability
Identifiability • Situation improves considerably, when – Observations of several genes are available – Several time points are available – Double or multiple knock ‐ downs are available – Different Stimulations / Conditions are available • Method should be adaptable for these cases!
A Pipeline for the Analysis of RNAi Screens Jc1GFP ‐ K1402Q •siRNA Spotting 36 h 36 h fixation •Experiment and IF seeding Huh7.5 cells HCV infection •Microscopy •Image Recognition •Quality Control •Statistical Analysis •Bioinformatics •Modeling = cyt − − dT dt k R R k T μ T / c ibo p c T c 1 2 c = c − dP dt k T k P / c 2 cyt = − cyt − cyt dE dt k P k E μ E / c Ein cyt E
Recommend
More recommend