nested effects models at work
play

Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic - PowerPoint PPT Presentation

30/09/2010 Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT) Principle Idea of Nested Effects Models Distinguish between: Perturbed


  1. 30/09/2010 Nested Effects Models at Work Prof. Dr. Holger Fröhlich Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT)

  2. Principle Idea of Nested Effects Models  Distinguish between:  Perturbed genes Φ S 1 S 2 S 3 S 4 (hidden variables) θ  Observed effects E E E E E E E E E Perturbed genes S1 S2 S3 S4  Measure downstream effects of each knock- Observed effects down  Network reconstruction is based on observed effects under different perturbations Markowetz et al., 2005 Page 5 Holger Fröhlich Algorithmic Bioinformatics

  3. Nested Effects Models (NEMs) are transitively closed causal networks explaining the nested structure of downstream effects. Page 6 Holger Fröhlich Algorithmic Bioinformatics

  4. Likelihood of the Signaling Graph ( Φ ) Two different approaches: Θ Φ = Θ  Bayesian: Integrate over effects linkage graphs Θ assuming : ( | ) ( ) P P ∫ Φ = Φ Θ Θ ( | ) ( | , ) ( ) P D P D P Θ Markowetz et al., 2005; Fröhlich et al., 2007, 2008  Take MAP/ML estimator for Θ : Θ = ˆ Φ Θ Θ arg max ( | , ) ( ) P D P Θ Tresch & Markowetz., 2008 Φ Θ ˆ Φ ( | , ) ( ) P D P Φ Θ = ˆ ( | , ) P D ( ) P D Page 7 Holger Fröhlich Algorithmic Bioinformatics

  5. Calculation of Effect Likelihoods  Factorization of the likelihood under i.i.d. assumption: ∫ Φ = Φ Θ Θ ( | ) ( | , ) ( ) P D P D P Θ ∏ ∑ ∏ = Φ Θ = Θ = ( | , 1) ( 1) P D P tk sk sk ∈ ε ∈ ∈ k s S t S ∏ ∑ ∏ = Φ ~ ( | ) P D m tk tk ts ∈ ε ∈ ∈ k s S t S 1. Model for binary data D with fixed error probabilities α and β : = =  1 0 D D tk tk  = = α − α if 1 m  ( | ) 1 P D m tk tk tk  = β − β if 0 1 m  tk Markowetz et al., 2005 Page 8 Holger Fröhlich Algorithmic Bioinformatics

  6. Modeling Continuous Data 2. Data D are computed as p-values for significant change, when comparing interventions to non-interventions.  Under the null hypothesis (i.e. expecting no effect) p-values are distributed uniformly  Under the alternative hypothesis (i.e. expecting an effect) there is a high density for small p-values and a strong decrease for increasing p-values [Pounds et al., 2003]. = π + π α + π β ( ) Beta( , ,1) Beta( ,1, ) f D D D tk 1 k 2 k tk t 3 k tk t  -> fit via EM algorithm  − ( ) (1) f D f =  tk if 1 m − =  1 (1) f ( | ) tk P D m tk tk =   if 0 m 1 tk Fröhlich et al., 2008 Page 9 Holger Fröhlich Algorithmic Bioinformatics

  7. Bioconductor Package ”nem” library(nem) load(“raw_pvaluesBoutros2002.rda“) D = getDensityMatrix(pvalues) Page 10 Holger Fröhlich Algorithmic Bioinformatics

  8. How to Infer the Network Structure?  Choose candidate graph S 1 S 2 S 3 S 4  Calculate score, e.g. using E E E E E Bayesian Com ombi bina natorial al ex expl plos osion: E E statistics E E (average over n = 4: 355 possible networks E-Gene n = 10: ~10 27 possible networks positions) Likelihood model  Propose different Complete enumeration of • topology all topologies Markowetz et al., 2005 Page 11 Holger Fröhlich Algorithmic Bioinformatics

  9. Heuristics for Large Networks (> 4 S-Genes).  Sampling Based (MCMC, Simulated Annealing)  SA: Fröhlich et al., BMC Bioinformatics , 2007  time consuming  neighborhood relation in transitively closed graphs difficult  Greedy hill climbing Fröhlich et al., Bioinformatics , 2008  Module networks Fröhlich et al., BMC Bioinformatics , 2007 Fröhlich et al., Bioinformatics , 2008  Triplets inference Markowetz et al., Bioinformatics , 2007 Alternating MAP optimization over Φ and θ  Tresch and Markowetz, Stat. Appl. Mol. Biol. , 2008 Page 12 Holger Fröhlich Algorithmic Bioinformatics

  10. Large Scale Networks: Module Networks • Problem : complete enumeration of all network hypotheses only possible for small networks (< 5 S-genes) • Solution : Divide and conquer 1. Highest scoring subnetworks for modules of S-Genes 2. Estimate connections between modules Page 13 Holger Fröhlich Algorithmic Bioinformatics

  11. Large Scale Networks: Module Networks S 3 S 4 S 2 S 5 Log-likelihood S 9 E E S 6 S 7 S 1 S 10 10 Network E E S 8 E Fröhlich et al., 2007, 2008 Page 14 Holger Fröhlich Algorithmic Bioinformatics

  12. Network Inference with the nem-Package control=set.default.paramet ers(unique(colnames(D)), type="CONTmLLBayes") mynem = nem(D, inference=“ModuleNetwork “, control=control, verbose=FALSE) plot.nem(mynem, SCC=FALSE, D=D, draw.lines=TRUE) Page 15 Holger Fröhlich Algorithmic Bioinformatics

  13. Automated Selection of Relevant E-Genes (Feature Selection)  Motivation: Irrelevant E-genes can degrade network estimation accuracy 1. Select E-Genes having a positive contribution to the model’s log-likelihood only. 2. Re-estimate the network with the new set of E-Genes 3. Iterate the process until convergence Fröhlich et al., 2008 Page 16 Holger Fröhlich Algorithmic Bioinformatics

  14. Network Inference with the nem-Package D2 = BoutrosRNAiDiscrete[,9:16] control=set.default.parameters (unique(colnames(D2)), selEGenes=TRUE) mynem2 = nem(D2, inference=“triples“, control=control, verbose=FALSE) plot.nem(mynem2, D=D2, draw.lines=TRUE) Page 17 Holger Fröhlich Algorithmic Bioinformatics

  15. Incorporation of Prior Knowledge • Bias scoring such that known interactions are considered • Bayesian prior on network structure ∏ Φ = Φ ( ) ( ) P P Φ = Signaling Graph ij , i j Φ ‘ ‘ = Prior Belief    − Φ − Φ | | 1 Φ ν =  ij ij  ν = Hyperparameter of ( | ) exp P   ν ν ij 2   Laplace Distribution Complete Complete trust in prior trust in data ν (scale of prior) ∞ ∫ Φ = Φ ν ν ν ( ) ( | ) ( ) P P P d ij ij 0 ν ~ (1,0.5) InvGamma 1 Φ = ( ) P  ( ) ij 2 + Φ − Φ 1 2 | | Fröhlich et al., 2008 ij ij Page 18 Holger Fröhlich Algorithmic Bioinformatics

  16. Using Prior Knowledge with the nem-Package control=set.default.parameters (unique(colnames(D)), selEGenes=TRUE, type=“CONTmLLMAP“, Pm=diag(4)) mynem3 = nem(D, control=control, verbose=FALSE) plot.nem(mynem3, SCC=FALSE, D=D, draw.lines=TRUE) Page 19 Holger Fröhlich Algorithmic Bioinformatics

  17. Statistical Stability and Significance  How stable the inferred network?  Do small changes of E-genes lead to different network hypotheses?   Use non-parametric bootstrap Sample n E-genes with repeat replacement  Is the inferred network better than 0.9 R Q random? 0.8  Randomly permute node labels 0.7 and look, whether random P S network has a higher likelihood. Page 20 Holger Fröhlich Algorithmic Bioinformatics

  18. Statistical Stability and Significance  How stable the inferred network?  Do small changes of E-genes lead to different network hypotheses?   Use non-parametric bootstrap Sample n E-genes with repeat replacement  Is the inferred network better than 0.9 S P random? 0.8  Randomly permute node labels 0.7 and look, whether random Q R network has a higher likelihood. Page 21 Holger Fröhlich Algorithmic Bioinformatics

  19. Statistical Stability and Significance  How stable the inferred network?  Do small changes of E-genes lead to different network hypotheses?   Use non-parametric bootstrap Sample n E-genes with repeat replacement  Is the inferred network better than 0.9 P R random? 0.8  Randomly permute node labels 0.7 and look, whether random Q S network has a higher likelihood. Page 22 Holger Fröhlich Algorithmic Bioinformatics

  20. Bootstrapping and Significance Calculation with the nem-Package control=set.default.parameters (unique(colnames(D)), type=“CONTmLLBayes“, Pm=diag(4)) mynem.boot = nem.bootstrap(D, nboot=100, control=control) plot.nem(mynem.boot, SCC=FALSE, plot.probs=TRUE) nem.calcSignificance(D, p = 0.037 (label N=1000, mynem.boot) permutation test) Page 23 Holger Fröhlich Algorithmic Bioinformatics

  21. Summary: nem-package  Inference of features of signaling pathways from high dimensional, targeted perturbation effects  Different likelihood models  Discretized data  P-value log-densities  Algorithms for inference of large networks  Module Networks  Triplets  Greedy hillclimbing  ...  Possibility to integrate prior knowledge  Automatic selection of relevant E-genes  Various plotting and analysis methods  Non-parametric bootstrap  Label permutation p-values Page 24 Holger Fröhlich Algorithmic Bioinformatics

  22. Fast and Efficient Learning of Dynamic Nested Effects Models Please come to our poster (G28, Tue)! Page 25 Holger Fröhlich Algorithmic Bioinformatics

Recommend


More recommend