Formal approaches to the synthesis of biological networks Nicola Paoletti Department of Computer Science, University of Oxford Metable Workshop Computer Laboratory, University of Cambridge, 26 Mar 2015
MOTIVATION Model checking Verification of given biological • IN: Model + Property φ M properties on a model • OUT: M | = φ ? • In Systems Biology, uncertain models are inevitable • Limited knowledge • Synthetic applications • Verification assumes a fully specified model
MOTIVATION Model checking FROM Verification of given biological • IN: Model + Property φ M properties on a model • OUT: M | = φ ? TO Synthesis (CS) Automatic derivation of a model • IN: “Uncertain” model + that meets given biological M ( · ) φ properties • OUT: f s.t. M ( f ) | = φ
PLAN OF THE TALK Part 1 Part 2
PLAN OF THE TALK Part 1
Part 1 SEA URCHIN DEVELOPMENTAL GRN (one of) THE MOST COMPLETE MODELS • Boolean GRN model (on/off + logical connections) • 45 genes • Time delays (due to chemical kinetics) • [0-30] hours post-fertilization (hpf). Step: 1 hpf • 4 spatial domains (leading to specific kinds of cells and organs) • 2 spatial relations between domains (direct and indirect contiguity) describing the evolution of embryonic geometry
Part 1 LIMITATIONS Davidson et al., PNAS 109(41) 16434-6442, 2012
Part 1 LIMITATIONS • Can’t fully explain experimental data (discrepancies on 26/45 genes) Davidson et al., PNAS 109(41) 16434-6442, 2012
Part 1 LIMITATIONS • Can’t fully explain experimental data (discrepancies on 26/45 genes). • Manual modifications to force observations (3/45 genes) Davidson et al., PNAS 109(41) 16434-6442, 2012
Part 1 SYNTHESIS OF GRNS How to obtain a GRN model that fully explains experimental data? SMT solving
Part 1 SATISFIABILITY MODULO THEORIES (SMT) SAT • IN: Boolean formula φ = ( x 1 ∨ ¬ x 2 ) ∧ x 3 • OUT: is SAT? + Interpretation of variables φ SMT (= SAT + theories) • IN: FOL formula over one or more theories (bit-vectors, (non-)linear φ = ∀ x 0 . ( x 0 < x 1 = integer/real arithmetic, …) ⇒ f ( x 0 , x 2 ) 6 x 3 ) • OUT: is SAT? + Interpretation of (free) variables and φ functions
Part 1 FORMAL MODEL (GRN + DELAYS + DOMAINS) TRANSITION SYSTEM DYNAMICS GRN MODEL • States (Boolean • genes Q = B | G × D | G • spatial domains expression of each gene in each D • discrete bounded time domain domain) T • Finite paths • spatial relations Π SR • Synchronous dynamics δ : Π → B r : D × D × T → B • Transition relation • update functions (aka Vector δ ( π ) ⇐ Equations) ⇒ F ^ π [ i ]( g, d ) = f g ( π , i, d ) f g : Π × T × D → B ( g ∈ G ) i ∈ T,g ∈ G,d ∈ D History and domain dependent
Part 1 FORMAL MODEL (GRN + DELAYS + DOMAINS) OBSERVATIONS O à Wildtype expression predicates on paths ¬ π [ 3 ]( g 1 , d 1 ) (e.g. gene g 1 is off at time 3 in domain d 1 ) Perturbation experiments à modified vector equations + predicates comparing wildtype and perturbed paths (e.g. g 1 is over-expressed in d 1 , time interval [5,10] and perturbation p 1 ) π p 1 [ 5, 10 ]( g 1 , d 1 ) > π [ 5, 10 ]( g 1 , d 1 ) TRANSITION$SYSTEM$DYNAMICS$ GRN$MODEL$ Q = B | G × D | • States&&&&&&&&&&&&&&&&&&&&&&&&&&& (Boolean& • genes& G expression&of&each&gene&in&each& • spatial&domains& D domain) & • discrete&bounded&time&domain& T • Finite&paths&& Π • spatial&relations& SR • Synchronous&dynamics& δ : Π → B r : D × D × T → B • Transition&relation&& • update&functions&(aka&Vector& δ ( π ) ⇐ Equations)&& ⇒ F ^ π [ i ]( g, d ) = f g ( π , i, d ) f g : Π × T × D → B ( g ∈ G ) i ∈ T,g ∈ G,d ∈ D History&and&domain&dependent&
Part 1 PROBLEM FORMULATION Input: • GRN with partial knowledge of N = ( G, D, SR, T, F ) F • Observations O Synthesize functions in s.t. the dynamics of admits F N paths that meet all observations Model encoded as constraints in the theory of bit-vectors (SMT QF_UFBV)
Part 1 FORMALIZATION OF VECTOR EQUATION LANGUAGE E ::= g | ¬ E | E ∧ E | E ∨ E | > t | < t | In ¯ d | In ¯ d E (evaluation in domain ) ¯ d | In r ¯ ¯ (evaluation in a domain in sp. relation with ) d d E r (delay of steps) | At- n E n | After- n E (delayed permanent activation) | Perm- n E (delayed permanent repression) E E At- 2 E After- 1 E
Part 1 FUNCTION SYNTHESIS Basic interactions (BI) are templates for the synthesis of regulatory terms f = op t r d b g domains temporal delays spatial input genes and their operators relations expression Examples ü Clear biological f = At- [ 1, 3 ] ¬ g 1 interpretation ü Incorporates uncertainty f = { After- , Perm- } ? In { d 1 , d 2 } ? { g 1 , g 2 }
Part 1 FUNCTION SYNTHESIS We use Uninterpreted Boolean Functions to uf : B n → B synthesize logical combinations of regulatory inputs (BIs or further UBFs) . Basic&interactions&(BI)& are)templates)for)the)synthesis)of)regulatory)terms) f = op t r d b g domains) temporal) delays) spatial) input)genes)and)their) operators) relations) expression)) Examples) ! Clear)biological) f = At- [ 1, 3 ] ¬ g 1 interpretation) f = { After- , Perm- } ? In { d 1 , d 2 } ? { g 1 , g 2 } ! Incorporates)uncertainty)
Part 1 RESULTS Software implementation of VE language and SMT-based synthesis methods with Z3 as solving engine UF of synthesis templates: Original function: f1:= {AT-,AFTER-}? IN ?? bra hfn1 := AT-2 bra AND AT-2 eve f2:= {AT-,AFTER-}? IN ?? eve hfn1 := uf(f1,f2) User-guided refinement UNSAT core-guided relaxation Synthesized function: f1:= AT-[0,5] bra f2:= AT-[0,5] eve hfn1 := AT-1 bra AND eve hfn1 := uf(f1,f2)
FULLY EXPLAINS EXPERIMENTAL DATA • Solved discrepancies • No need for manual modifications
Part 1 SUMMARY • SMT-based method for synthesizing models of GRNs • Applied to state-of-the-art model of sea urchin development • 45 genes x 4 spatial domains x 2 spatial relations x [0,30] hpf • Wildtype expression + 3 perturbation experiments • Formal encoding of biological DSL • Synthesized model fully explains observations with no major changes • Effective (performance depends on size of search space for functions) f1:= {AT-,AFTER-}[0,6] IN ?? bra f2:= {AT-,AFTER-}[0,6] IN ?? eve 352800 possible functions! hfn1 := uf(f1,f2)
PLAN OF THE TALK Part 2
Part 2 STOCHASTIC BIOCHEMICAL REACTION NETWORKS • Formalism for biosystems like • signalling pathways, gene regulation, epidemic models, … • molecular devices, DNA logic gates, DNA walker circuits, … • low molecular counts à stochastic dynamics à Continuous Time Markov Chains (CTMC) • Uncertain kinetic parameters
Part 2 STOCHASTIC BIOCHEMICAL REACTION NETWORKS AIM: Precise parameter synthesis synthesising parameters so that a given property is guaranteed to hold or the probability of satisfying is maximised/minimized
Part 2 STOCHASTIC BIOCHEMICAL REACTION NETWORKS Parametric CTMC (pCTMC) semantics • STATES: vector of populations/species counts • (parametric) TRANSITION RATES: kinetic rate functions (e.g. mass action law) • PARAMETER SPACE (continuous): intervals of kinetic parameters Reactions and rate functions CTMC 0 1 2 3 4 40 Property (Continuous Stochastic Logic) Parameters
SATISFACTION FUNCTION 0.5 0 1 2 3 4 40 0.4 0.3 0.2 0.1 0.0 0.10 0.15 0.20 0.25 0.30
0.5 probability bounds 0.4 0.3 0.2 0.1 0.0 0.10 0.15 0.20 0.25 0.30
0.5 probability bounds 0.4 0.3 0.2 0.1 0.0 0.10 0.15 0.20 0.25 0.30
Part 2 SYNTHESIS METHOD (SKETCH) 1) Method to compute SAFE APPROXIMATIONS to min and max probabilities over a fixed parameter region Upper and lower bounds Safe approximations
Part 2 SYNTHESIS METHOD (SKETCH) 1) Method to compute SAFE APPROXIMATIONS to min and max probabilities over a fixed parameter region 2) Parameter space decomposition à improves accuracy of approximation Upper and lower bounds Safe approximations
Part 2 SYNTHESIS METHOD (SKETCH) 1) Method to compute SAFE APPROXIMATIONS to min and max probabilities over a fixed parameter region 2) Parameter space decomposition à improves accuracy of approximation 3) Synthesis algorithms iterate steps 1) and 2) until required precision is reached Threshold ( ≥r ) Max • • True if lower bound above threshold r False if upper bound below under- • False if upper bound below r approximation of maximum probability M • • Undecided otherwise (to refine) True otherwise (to refine)
Part 2 APPLICATIONS: SIR EPIDEMIC MODEL k i S + I → I + I k i k r uncertain parameters Susceptible à Infected − k r Infected à Recovered I → R − (infection lasts at least 100 time units and Property: φ = ( I > 0 ) U [ 100,120 ] ( I = 0 ) ends within 120 time units)
Recommend
More recommend