30/09/2010 Nested Effects Models at Work Prof. Dr. Holger Fröhlich Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT)
Principle Idea of Nested Effects Models Distinguish between: Perturbed genes Φ S 1 S 2 S 3 S 4 (hidden variables) θ Observed effects E E E E E E E E E Perturbed genes S1 S2 S3 S4 Measure downstream effects of each knock- Observed effects down Network reconstruction is based on observed effects under different perturbations Markowetz et al., 2005 Page 5 Holger Fröhlich Algorithmic Bioinformatics
Nested Effects Models (NEMs) are transitively closed causal networks explaining the nested structure of downstream effects. Page 6 Holger Fröhlich Algorithmic Bioinformatics
Likelihood of the Signaling Graph ( Φ ) Two different approaches: Θ Φ = Θ Bayesian: Integrate over effects linkage graphs Θ assuming : ( | ) ( ) P P ∫ Φ = Φ Θ Θ ( | ) ( | , ) ( ) P D P D P Θ Markowetz et al., 2005; Fröhlich et al., 2007, 2008 Take MAP/ML estimator for Θ : Θ = ˆ Φ Θ Θ arg max ( | , ) ( ) P D P Θ Tresch & Markowetz., 2008 Φ Θ ˆ Φ ( | , ) ( ) P D P Φ Θ = ˆ ( | , ) P D ( ) P D Page 7 Holger Fröhlich Algorithmic Bioinformatics
Calculation of Effect Likelihoods Factorization of the likelihood under i.i.d. assumption: ∫ Φ = Φ Θ Θ ( | ) ( | , ) ( ) P D P D P Θ ∏ ∑ ∏ = Φ Θ = Θ = ( | , 1) ( 1) P D P tk sk sk ∈ ε ∈ ∈ k s S t S ∏ ∑ ∏ = Φ ~ ( | ) P D m tk tk ts ∈ ε ∈ ∈ k s S t S 1. Model for binary data D with fixed error probabilities α and β : = = 1 0 D D tk tk = = α − α if 1 m ( | ) 1 P D m tk tk tk = β − β if 0 1 m tk Markowetz et al., 2005 Page 8 Holger Fröhlich Algorithmic Bioinformatics
Modeling Continuous Data 2. Data D are computed as p-values for significant change, when comparing interventions to non-interventions. Under the null hypothesis (i.e. expecting no effect) p-values are distributed uniformly Under the alternative hypothesis (i.e. expecting an effect) there is a high density for small p-values and a strong decrease for increasing p-values [Pounds et al., 2003]. = π + π α + π β ( ) Beta( , ,1) Beta( ,1, ) f D D D tk 1 k 2 k tk t 3 k tk t -> fit via EM algorithm − ( ) (1) f D f = tk if 1 m − = 1 (1) f ( | ) tk P D m tk tk = if 0 m 1 tk Fröhlich et al., 2008 Page 9 Holger Fröhlich Algorithmic Bioinformatics
Bioconductor Package ”nem” library(nem) load(“raw_pvaluesBoutros2002.rda“) D = getDensityMatrix(pvalues) Page 10 Holger Fröhlich Algorithmic Bioinformatics
How to Infer the Network Structure? Choose candidate graph S 1 S 2 S 3 S 4 Calculate score, e.g. using E E E E E Bayesian Com ombi bina natorial al ex expl plos osion: E E statistics E E (average over n = 4: 355 possible networks E-Gene n = 10: ~10 27 possible networks positions) Likelihood model Propose different Complete enumeration of • topology all topologies Markowetz et al., 2005 Page 11 Holger Fröhlich Algorithmic Bioinformatics
Heuristics for Large Networks (> 4 S-Genes). Sampling Based (MCMC, Simulated Annealing) SA: Fröhlich et al., BMC Bioinformatics , 2007 time consuming neighborhood relation in transitively closed graphs difficult Greedy hill climbing Fröhlich et al., Bioinformatics , 2008 Module networks Fröhlich et al., BMC Bioinformatics , 2007 Fröhlich et al., Bioinformatics , 2008 Triplets inference Markowetz et al., Bioinformatics , 2007 Alternating MAP optimization over Φ and θ Tresch and Markowetz, Stat. Appl. Mol. Biol. , 2008 Page 12 Holger Fröhlich Algorithmic Bioinformatics
Large Scale Networks: Module Networks • Problem : complete enumeration of all network hypotheses only possible for small networks (< 5 S-genes) • Solution : Divide and conquer 1. Highest scoring subnetworks for modules of S-Genes 2. Estimate connections between modules Page 13 Holger Fröhlich Algorithmic Bioinformatics
Large Scale Networks: Module Networks S 3 S 4 S 2 S 5 Log-likelihood S 9 E E S 6 S 7 S 1 S 10 10 Network E E S 8 E Fröhlich et al., 2007, 2008 Page 14 Holger Fröhlich Algorithmic Bioinformatics
Network Inference with the nem-Package control=set.default.paramet ers(unique(colnames(D)), type="CONTmLLBayes") mynem = nem(D, inference=“ModuleNetwork “, control=control, verbose=FALSE) plot.nem(mynem, SCC=FALSE, D=D, draw.lines=TRUE) Page 15 Holger Fröhlich Algorithmic Bioinformatics
Automated Selection of Relevant E-Genes (Feature Selection) Motivation: Irrelevant E-genes can degrade network estimation accuracy 1. Select E-Genes having a positive contribution to the model’s log-likelihood only. 2. Re-estimate the network with the new set of E-Genes 3. Iterate the process until convergence Fröhlich et al., 2008 Page 16 Holger Fröhlich Algorithmic Bioinformatics
Network Inference with the nem-Package D2 = BoutrosRNAiDiscrete[,9:16] control=set.default.parameters (unique(colnames(D2)), selEGenes=TRUE) mynem2 = nem(D2, inference=“triples“, control=control, verbose=FALSE) plot.nem(mynem2, D=D2, draw.lines=TRUE) Page 17 Holger Fröhlich Algorithmic Bioinformatics
Incorporation of Prior Knowledge • Bias scoring such that known interactions are considered • Bayesian prior on network structure ∏ Φ = Φ ( ) ( ) P P Φ = Signaling Graph ij , i j Φ ‘ ‘ = Prior Belief − Φ − Φ | | 1 Φ ν = ij ij ν = Hyperparameter of ( | ) exp P ν ν ij 2 Laplace Distribution Complete Complete trust in prior trust in data ν (scale of prior) ∞ ∫ Φ = Φ ν ν ν ( ) ( | ) ( ) P P P d ij ij 0 ν ~ (1,0.5) InvGamma 1 Φ = ( ) P ( ) ij 2 + Φ − Φ 1 2 | | Fröhlich et al., 2008 ij ij Page 18 Holger Fröhlich Algorithmic Bioinformatics
Using Prior Knowledge with the nem-Package control=set.default.parameters (unique(colnames(D)), selEGenes=TRUE, type=“CONTmLLMAP“, Pm=diag(4)) mynem3 = nem(D, control=control, verbose=FALSE) plot.nem(mynem3, SCC=FALSE, D=D, draw.lines=TRUE) Page 19 Holger Fröhlich Algorithmic Bioinformatics
Statistical Stability and Significance How stable the inferred network? Do small changes of E-genes lead to different network hypotheses? Use non-parametric bootstrap Sample n E-genes with repeat replacement Is the inferred network better than 0.9 R Q random? 0.8 Randomly permute node labels 0.7 and look, whether random P S network has a higher likelihood. Page 20 Holger Fröhlich Algorithmic Bioinformatics
Statistical Stability and Significance How stable the inferred network? Do small changes of E-genes lead to different network hypotheses? Use non-parametric bootstrap Sample n E-genes with repeat replacement Is the inferred network better than 0.9 S P random? 0.8 Randomly permute node labels 0.7 and look, whether random Q R network has a higher likelihood. Page 21 Holger Fröhlich Algorithmic Bioinformatics
Statistical Stability and Significance How stable the inferred network? Do small changes of E-genes lead to different network hypotheses? Use non-parametric bootstrap Sample n E-genes with repeat replacement Is the inferred network better than 0.9 P R random? 0.8 Randomly permute node labels 0.7 and look, whether random Q S network has a higher likelihood. Page 22 Holger Fröhlich Algorithmic Bioinformatics
Bootstrapping and Significance Calculation with the nem-Package control=set.default.parameters (unique(colnames(D)), type=“CONTmLLBayes“, Pm=diag(4)) mynem.boot = nem.bootstrap(D, nboot=100, control=control) plot.nem(mynem.boot, SCC=FALSE, plot.probs=TRUE) nem.calcSignificance(D, p = 0.037 (label N=1000, mynem.boot) permutation test) Page 23 Holger Fröhlich Algorithmic Bioinformatics
Summary: nem-package Inference of features of signaling pathways from high dimensional, targeted perturbation effects Different likelihood models Discretized data P-value log-densities Algorithms for inference of large networks Module Networks Triplets Greedy hillclimbing ... Possibility to integrate prior knowledge Automatic selection of relevant E-genes Various plotting and analysis methods Non-parametric bootstrap Label permutation p-values Page 24 Holger Fröhlich Algorithmic Bioinformatics
Fast and Efficient Learning of Dynamic Nested Effects Models Please come to our poster (G28, Tue)! Page 25 Holger Fröhlich Algorithmic Bioinformatics
Recommend
More recommend