a statistical bayesian framework for the identification
play

A statistical Bayesian framework for the identification of - PowerPoint PPT Presentation

A statistical Bayesian framework for the identification of biological networks from perturbation experiments ECCB 2010, Ghent, Belgium Nicole Radde Institute for Systems Theory and Automatic Control University of Stuttgart September 26, 2010


  1. A statistical Bayesian framework for the identification of biological networks from perturbation experiments ECCB 2010, Ghent, Belgium Nicole Radde Institute for Systems Theory and Automatic Control University of Stuttgart September 26, 2010

  2. Parameter estimation as inverse problem Optimization problem model m characterized by a parameter vector θ Given: dataset y Find ˆ θ that optimizes an objective function F ( θ, y ) Wanted: ˆ θ = arg min θ F ( θ, y ) Identification of biological networks from perturbations, Nicole Radde 1 / 23

  3. Content Statistical approaches for parameter estimation 1 Bayesian regularization 2 Application results 3 Conclusions 4 Identification of biological networks from perturbations, Nicole Radde 2 / 23

  4. Statistical approaches Statistical approaches y : Random variables, sampling distribution p ( y | θ ) Standard objective function: log-likelihood − log L y ( θ ) = − log p ( y | θ ) can directly include noise can handle latent variables → marginalization: � p ( y | θ ) = p ( y , x | θ ) X y : observables with x : latent variables Identification of biological networks from perturbations, Nicole Radde 3 / 23

  5. Identifiability Practical non-identifiability: Sparse data (low time resolution, hidden states) Flat likelihood: Fisher information I ( θ ML ) has small eigenvalues → Normal approximation N ( θ ML , I − 1 ( θ ML )) has large covariance becomes better with increasing dataset size Structural non-identifiability: Independent of dataset size Correlation between parameters Identification of biological networks from perturbations, Nicole Radde 4 / 23

  6. Structural non-identifiability Example: Measure steady state: k − 1 − − ⇀ A − − B ˙ ↽ [ A ] = − k 1 [ A ] + k − 1 [ B ] k 1 y = [¯ A ] ˙ − ˙ B ] = k − 1 [ B ] = [ A ] [¯ Parameters k 1 [ A ] + [ B ] = N θ = ( k 1 , k − 1 ) � � 2 � � − 1 k − 1 Likelihood function: p ( k 1 , k − 1 ) ∝ exp k 1 − 1 2 Identification of biological networks from perturbations, Nicole Radde 5 / 23

  7. Regularization Idea Problem: Data does not contain enough information to identify parameter values Large variance of ML/MSE estimates across different experiments Regularization: Add additional data-independent regularization term in objective function e.g. Tikhonov-regularization: � θ TR = arg min ˆ � y i − x i ( θ ) � 2 α � θ � 2 + (1) θ � �� � � �� � i data-term regularization term Identification of biological networks from perturbations, Nicole Radde 6 / 23

  8. Bayesian regularization θ, y are both random variables joint distribution: p ( y , θ ) = p ( y | θ ) p ( θ ) = p ( θ | y ) p ( y ) Objective function is the posterior distribution: p ( y | θ ) Likelihood function p ( θ | y ) = p ( y | θ ) p ( θ ) with p ( θ ) Prior distribution p ( y ) p ( y ) Evidence posterior 0.18 0.16 prior 0.14 0.12 0.1 0.18 0.08 0.06 0.16 0.04 0.14 0.02 0.12 0 8 0.1 6 0.08 4 2 0.06 0 θ 2 0.04 -2 0.02 -4 0 -6 -8 -8 -6 -4 -2 0 2 4 6 8 θ 1 Identification of biological networks from perturbations, Nicole Radde 7 / 23

  9. Bayesian regularization − log p ( θ | y ) = − log p ( y | θ ) − log p ( θ ) + logp ( y ) � �� � � �� � � �� � data-term regularization term independent of θ The posterior distribution does not only provide point estimates, but also contains information about confidence intervals and identifiability Information-theoretic concepts can be used as measures for the information content of the posterior distribution Identification of biological networks from perturbations, Nicole Radde 8 / 23

  10. Stochastic embedding of ODEs Measurement noise models x = f ( x , θ 1 ) → x ( t , x 0 , θ 1 ) ˙ deterministic System: y t Observations: i = x i ( t , x 0 , θ 2 ) + ǫ ( θ 2 ) , ǫ : noise stochastic Independence graph: Sampling distribution: n T � � p ( y t p ( y | x , θ ) = i | x i ( t , x 0 , θ 1 ) , θ 2 ) i =1 t =1 Identification of biological networks from perturbations, Nicole Radde 9 / 23

  11. Sampling schemes: Rejection sampling Rejection sampling 1. Sample θ t from prior p ( θ ) 2. Accept θ t with p = p ( θ | y ) M · p ( θ ) 0.4 posterior prior envelope 0.35 0.3 0.25 Acceptance rate: M p(x)=posterior(x)/M*prior(x) 0.2 0.15 0.1 0.05 0 -10 -5 0 5 10 θ Uncorrelated samples, but low (1/M) acceptance rate! Identification of biological networks from perturbations, Nicole Radde 10 / 23

  12. MCMC sampling Markov Chain Monte Carlo sampling Can be used if acceptance rate of rejection or importance sampling is low Produces correlated samples, but has higher acceptance rate Computationally expensive if mixing is slow (long time of MC to converge to equilibrium distribution) Sampling scheme: 1 Sample θ t +1 from a Markov chain p ( θ t +1 | θ t ) 2 Accept θ t +1 with � � 1 , p ( θ t +1 | y ) p ( θ t | θ t +1 ) p = min p ( θ t | y ) p ( θ t +1 | θ t ) Identification of biological networks from perturbations, Nicole Radde 11 / 23

  13. Hamiltonian Monte Carlo 1. Write target density p ( θ ) ∼ exp ( − V ( θ )) 2. Extend sampling space θ by auxiliary momentum vector η : � � − 1 2 η T η − V ( θ ) p ( θ, η ) ∼ exp = exp ( − H ( θ, η )) 3. Start with random momentum drawn from a Gaussian distribution 4. Create trajectory in the space θ according to ˙ θ = η ˙ = −∇ V ( θ ) η 5. Accept new θ with P A = min(1 , exp ( − ∆ H ( θ, η ))) Faster mixing time by producing less correlated samples (larger steps), but harder to tune and implement. Identification of biological networks from perturbations, Nicole Radde 12 / 23

  14. Posterior summaries Posterior samples can be used for: Posterior density estimation: Estimation of posterior summaries: Entropy Posterior information content about θ KLD(prior � posterior) Information content of data y about θ Mode Maximum a-posteriori point estimator Mean Point estimator Experimental design: Choose experiment that maximizes the expected information content of the posterior distribution Identification of biological networks from perturbations, Nicole Radde 13 / 23

  15. Secretory pathway control Regulation of secretion at the TGN via the protein kinase D and the ceramide transfer protein CERT Cooperation with the Institute of Immunology and Cell Biology: Angelika Hausser Monilola Olayioye Identification of biological networks from perturbations, Nicole Radde 14 / 23

  16. Modeling framework Model for secretory pathway control 1 ⋆ = θ 1 x 6 − x 1 x 1 ˙ PKD 6 2 ˙ = θ 2 x 1 − x 2 PI(4)KIII β x 2 = θ 3 x 2 − x 3 x 3 ˙ PI(4)P ⋆ = θ 4 x 3 − x 4 − θ 8 x 1 x 4 ˙ CERT x 4 1+ x 4 5 3 x 5 = θ 5 x 4 − θ 6 x 5 − θ 9 x 5 ˙ ceramide 1+ x 5 4 x 5 ˙ = − θ 7 x 6 + θ 9 DAG x 6 1+ x 5 In the following: Estimation of θ = ( θ 2 , θ 8 ) Identification of biological networks from perturbations, Nicole Radde 15 / 23

  17. Perturbation experiments 1 1 ⋆ ⋆ 6 2 6 2 ⋆ ⋆ 5 3 5 3 4 4 Measurements: Relative steady states of two components under different network perturbations: x P = ¯ y P ǫ ∼ N (0 , σ 2 ) ¯ x U + ǫ, i ¯ Prior: Gamma distributions Identification of biological networks from perturbations, Nicole Radde 16 / 23

  18. Posterior 4.5 4.5 4 4 3.5 3.5 3 3 2.5 θ 8 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 θ 2 Identification of biological networks from perturbations, Nicole Radde 17 / 23

  19. Simple ODE Example System: 1 θ 2 x 1 = a 1 − u θ 1 x 1 − θ 2 x 2 − b 1 x 1 ˙ x 2 = a 2 + u θ 1 x 1 − h ( c 1 , x 2 ) c 1 − b 2 x 2 ˙ θ 1 2 x 3 = a 3 − θ 2 x 1 + h ( c 1 , x 2 ) − b 3 x 3 ˙ x 2 h ( c , x ) = c 3 1 + x 2 Measurements: y i = ¯ x i ( θ, u ) u ) + ǫ , ǫ ∼ N (0 , (0 . 01 y i ) 2 ) i ∈ I = { 1 , 3 } ¯ x i ( θ, ˆ Parameters: a = 1 2(2 , 3 , 1) T , b = 1 2(1 , 4 , 2) T , c = 0 . 7 , α = 0 . 1 , θ ⋆ = (1 , 0 . 1) T � � � y ⋆ − y ( θ ) � 2 − 1 Posterior: p ( θ | y ) ∝ exp − α � θ � 0 . 5 2 σ 2 Identification of biological networks from perturbations, Nicole Radde 18 / 23

  20. Sampling tests: MCMC vs. HMC prior: MCMC posterior: HMC posterior: Parameters are identifiable HMC performs better in this example with the same efficient sampling size, but is also computationally more expensive We expect that HMC outperforms MCMC in higher dimensions Identification of biological networks from perturbations, Nicole Radde 19 / 23

  21. Sampling-Summary Correlation time τ int , A : average amount of Markov Chain steps after which we have a new independent point. It varies depending on the observable A . It reduces the effective sample size: N N A eff . = . 2 τ int , A Example: N = 1000 t in s τ int ,θ 1 τ int ,θ 2 efficiency Hybrid Monte Carlo 1479.8 23 10 0.020 Metropolis Monte Carlo 315.83 103 84 0.017 the efficiency is the number of independent points per second. t is the computation time (duration of the whole sampling procedure). Identification of biological networks from perturbations, Nicole Radde 20 / 23

Recommend


More recommend