Toward WC models for predicting cellular phenotypes Jonathan Karr karr@mssm.edu July 17, 2019
Join us at Mount Sinai! karr@mssm.edu KarrLab.org
Acknowledgements Yassmine Chebaro Yin Hoon Chew Arthur Goldberg Zhouyang Lian Yosef Roth John Sekar Bilal Shaikh Balazs Szigeti
Outline Genotype to cellular phenotype – What is a WC model? – Why do we need WC models? – Challenges & feasibility – Foundational principles and state of the art – Progress toward comprehensive models Tips for modeling complex systems 4
What is a WC model?
6
Goals of WC modeling Species-specific Mechanistic AGTC Whole cell Dynamic Stochastic Whole genome Whole cell cycle 7
Motivation
Synthetic biology requires WC models Biosensors Biofactories Tissue engineering 9
Example: drug biosynthesis 10
Example: drug biosynthesis 11
Example: drug biosynthesis 12
Example: drug biosynthesis 13
Example: drug biosynthesis 14
Example: drug biosynthesis 15
Example: drug biosynthesis 16
Precision medicine requires WC models 17
Challenges
Challenge: explain diverse chemistry Transcriptional regulation Logical Metabolism Signaling FBA ODE, SSA 19
Challenge: explain multiple scales Growth Length Replication Transcription Metabolism Time 20
Challenge: capture chemical complexity 21
Challenge: heterogeneous data Protein expression Mass-spec, Western blot Transcription Single-cell variation RNA-seq Microscopy 22
Challenge: incomplete data 23
Feasibility
Feasibility: Extensive data 25
Feasibility: Rule-based modeling 26
Feasibility: Multi-algorithm simulation Uptake FBA Metabolism FBA Transcription Stochastic events Translation Stochastic events Replication Chemical kinetics 27
WC modeling is becoming feasible Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic simulation 28
Workflow
Workflow 30
1. Focus on simple cells E. coli M. genitalium Genome 4700 kb 580 kb Genes 4461 525 Size 2 μ m × 0.5 μ m 0.2-0.3 μ m 31
2. Aggregate and integrate data 32
3. Model each process Transcriptional regulation Metabolism Signaling 33
3. Model each process Boolean Bolouri, 2000’s FBA Scope Palsson, 1990’s ODE Shuler, 1970’s PDE Gillespie Luthey-Schulten, 2011 Detail 34
3. Model each process Metabolism Species and reactions ADP + Pi + 4 H+[p] ↓ ATPase ATP + H2O + 3 H+[c] Catalysis ATPase = 3 * AtpA 1 * AtpB 1 * AtpC 3 * AtpD 10 * AtpE 2 * AtpF 1 * AtpG 1 * AtpH Kinetics [ADP] 𝑤 = k cat ATPase 𝐿 𝑛 + [ADP] 35
4. Merge models into a single model Mass, shape Uptake FBA Metabolite, RNA, Composition protein counts Metabolism FBA Transcript, polypeptide Composition sequences Transcription Stochastic events DNA polymerization, Gene expression proteins, modifications Translation Stochastic events Gene expression FtsZ ring Replication Chemical kinetics DNA sequence Mammalian host States Submodels 36
5. Co-simulate models Uptake Uptake Uptake Cell states Metabolism Cell states Metabolism Cell states Metabolism Transcription Transcription Transcription Translation Translation Translation Replication Replication Replication 1 s 37
6. Verify model Matches training data Matches theory Cell mass, volume Mass conservation Biomass composition Central dogma RNA, protein expression, half- Cell theory lives Evolution Superhelicity No obvious errors Matches published data Plot model predictions Metabolite concentrations Manually inspect data DNA-bound protein density Compare to known biology Gene essentiality 38
State of the art
40
WC models provide novel insights v v 41
WC models help design cells lacI 42
WC models help purpose drugs 43
Limitations of the Mycoplasma model • Represents one of the smallest bacteria • Ignores several processes • Mispredicts several phenotypes • Methods were ad hoc • Hard to understand, reuse, and expand • Time-consuming to build 44
Toward more comprehensive and more accurate models
Goal: design precise therapy 46
Challenge: H1 hESCs • Karyotypically normal • Autonomous • Well-characterized 47
Bottlenecks 48
Bottlenecks • Data aggregation: Hard to find relevant data – Data is incomplete, scattered, and insufficient annotated • Model design: Hard to capture multiple scales and describe models modularly – Insufficient abstraction and metadata • Simulation : Hard to simulate multiple scales – Simulators are only support individual formalisms and are slow • Verification: Little formalism or standardization • Collaboration: Difficult to describe the data, assumptions, and decisions that underlie modeling 49
Data needed for WC modeling substrate 𝑤 = 𝑙 cat [enzyme] substrate + 𝐿 m Metabolite concentrations Enzyme concentrations Reaction kinetics 50
Data needed for WC modeling 51
Datanator: data integration & discovery Find Aggregate Reduce Review • Species • Environment 7.3 10 -4 mM 52
Datanator: data aggregation Metabolites Protein Interactions Pathways • ChEBI • COMPARTMENTS • BioCyc • KEGG • ECMDB, YMDB • CORUM • DBTBS • Pathway Commons • PubChem • Human Protein Ref. DB • DrugBank • Reactome • Pax-DB • JASPAR DNA • WikiPathways • PDB • KEGG • GenBank • PSORTdb • SuperTarget Rates • RESID RNA • BRENDA Taxonomy • UniProt • Array Express • SABIO-RK • NCBI • MODOMICS • RNALocate • RNA MOD 53
Datanator: actionable metadata Measured Data generation process entity/property – Experimental design Measured value, – Measurement method uncertainty, units Data analysis process Genotype – Software – Taxon – Version – Genetic variant Metadata – Cell, tissue type – Authors Environment – Curator – Temperature – Date – pH – Citation – Growth media 54
Datanator: Finding relevant data Chemical similarity – Tanimoto index – Sequence similarity Genetic similarity – Whole-genome similarity – Taxonomic distance Environmental similarity – Temperature – pH 55
WC-Lang: scalable model descriptions • Concretely describe composite multi- algorithmic models • Concrete descriptions of every model element • Capture data and assumptions underlying models • Explicit descriptions of mixed granularity / lumping • Structured description of initial conditions • User interfaces suited to large models 57
WC-Lang: scalable model descriptions RNA(i, 0) + NTP(i, 1) RNA(i, 1) + PPi RNA(i, 1) + NTP(i, 2) RNA(i, 2) + PPi RNA(i, 2) + NTP(i, 3) RNA(i, 3) + PPi RNA(i, 3) + NTP(i, 4) RNA(i, 4) + PPi … 58
WC-Lang: scalable model descriptions RNA(i, l) + NTP(i, l+1) RNA(i, l+1) + PPi RNA(I, l) + H 2 O RNA(i, l-1) + NMP(i, l) Protein(i, l) + AA(i, l+1) Protein(i, l+1) + H 2 O Protein(i, l) + H 2 O Protein(i, l-1) + AA(i, l) 59
WC-Lang: scalable model descriptions Initiation Elongation Termination SBML 1 per RNA 335 1 per base ~500k 1 per RNA 335 Rules 1 1 1 1 1 1 61
WC-Sim: scalable co-simulation 64
H1-hESC model H1 Composable Recon 2.2 model model • Kinetic data • H1 (SABIO-RK) transcriptomics • Protein data (ENCODE) abundance • Cell composition (Phanstiel et • Media al., 2011; composition PaxDB) 66
Summary
Availability • Code: code.karrlab.org (GitHub, PyPI) • Data: data.karrlab.org (Quilt) • Images: DockerHub • Primer and docs: docs.karrlab.org • Tutorials: sandbox.karrlab.org 68
Summary Bioengineering and medicine needs WC models WC modeling is becoming feasible New technologies will enable WC modeling Pilot models will show the feasibility of bacteria and human models 69
Tips & tricks 70
Challenges to g2p2pop • Build models from imperfect data • Capture complexity within and between scales • Systematically link scales • Scalably simulate multiple scales • Collaborate
Stretch goals inspire innovation 72
Integration enables great scope and depth • Data aggregation • Model composition • Multi-algorithmic co-simulation • Modular methods and software • Interdisciplinary collaboration 73
Frameworks enable scalable integration 75
Common languages enable frameworks 76
Agent-based modeling can capture complexity 77
Collaboration enables solutions 78
Modularity enables collaboration 79
Sharing promotes collaboration • Quilt: data • GitHub: code • PyPI: packaged code • Docker: computing environments • Google Docs, Overleaf: written documents • Google Drive: other files • GitHub issues: tasks 80
Common practices ease collaboration • Interfaces between modules • Coarse-graining • Package organization • Coding, documentation styles • Software libraries 81
Recommend
More recommend