jonathan karr
play

Jonathan Karr karr@mssm.edu July 17, 2019 Join us at Mount Sinai! - PowerPoint PPT Presentation

Toward WC models for predicting cellular phenotypes Jonathan Karr karr@mssm.edu July 17, 2019 Join us at Mount Sinai! karr@mssm.edu KarrLab.org Acknowledgements Yassmine Chebaro Yin Hoon Chew Arthur Goldberg Zhouyang Lian Yosef Roth John


  1. Toward WC models for predicting cellular phenotypes Jonathan Karr karr@mssm.edu July 17, 2019

  2. Join us at Mount Sinai! karr@mssm.edu KarrLab.org

  3. Acknowledgements Yassmine Chebaro Yin Hoon Chew Arthur Goldberg Zhouyang Lian Yosef Roth John Sekar Bilal Shaikh Balazs Szigeti

  4. Outline Genotype to cellular phenotype – What is a WC model? – Why do we need WC models? – Challenges & feasibility – Foundational principles and state of the art – Progress toward comprehensive models Tips for modeling complex systems 4

  5. What is a WC model?

  6. 6

  7. Goals of WC modeling Species-specific Mechanistic AGTC Whole cell Dynamic Stochastic Whole genome Whole cell cycle 7

  8. Motivation

  9. Synthetic biology requires WC models Biosensors Biofactories Tissue engineering 9

  10. Example: drug biosynthesis 10

  11. Example: drug biosynthesis 11

  12. Example: drug biosynthesis 12

  13. Example: drug biosynthesis 13

  14. Example: drug biosynthesis 14

  15. Example: drug biosynthesis 15

  16. Example: drug biosynthesis 16

  17. Precision medicine requires WC models 17

  18. Challenges

  19. Challenge: explain diverse chemistry Transcriptional regulation Logical Metabolism Signaling FBA ODE, SSA 19

  20. Challenge: explain multiple scales Growth Length Replication Transcription Metabolism Time 20

  21. Challenge: capture chemical complexity 21

  22. Challenge: heterogeneous data Protein expression Mass-spec, Western blot Transcription Single-cell variation RNA-seq Microscopy 22

  23. Challenge: incomplete data 23

  24. Feasibility

  25. Feasibility: Extensive data 25

  26. Feasibility: Rule-based modeling 26

  27. Feasibility: Multi-algorithm simulation Uptake FBA Metabolism FBA Transcription Stochastic events Translation Stochastic events Replication Chemical kinetics 27

  28. WC modeling is becoming feasible Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic simulation 28

  29. Workflow

  30. Workflow 30

  31. 1. Focus on simple cells E. coli M. genitalium Genome 4700 kb 580 kb Genes 4461 525 Size 2 μ m × 0.5 μ m 0.2-0.3 μ m 31

  32. 2. Aggregate and integrate data 32

  33. 3. Model each process Transcriptional regulation Metabolism Signaling 33

  34. 3. Model each process Boolean Bolouri, 2000’s FBA Scope Palsson, 1990’s ODE Shuler, 1970’s PDE Gillespie Luthey-Schulten, 2011 Detail 34

  35. 3. Model each process Metabolism Species and reactions ADP + Pi + 4 H+[p] ↓ ATPase ATP + H2O + 3 H+[c] Catalysis ATPase = 3 * AtpA 1 * AtpB 1 * AtpC 3 * AtpD 10 * AtpE 2 * AtpF 1 * AtpG 1 * AtpH Kinetics [ADP] 𝑤 = k cat ATPase 𝐿 𝑛 + [ADP] 35

  36. 4. Merge models into a single model Mass, shape Uptake FBA Metabolite, RNA, Composition protein counts Metabolism FBA Transcript, polypeptide Composition sequences Transcription Stochastic events DNA polymerization, Gene expression proteins, modifications Translation Stochastic events Gene expression FtsZ ring Replication Chemical kinetics DNA sequence Mammalian host States Submodels 36

  37. 5. Co-simulate models Uptake Uptake Uptake Cell states Metabolism Cell states Metabolism Cell states Metabolism Transcription Transcription Transcription Translation Translation Translation Replication Replication Replication 1 s 37

  38. 6. Verify model  Matches training data  Matches theory  Cell mass, volume  Mass conservation  Biomass composition  Central dogma  RNA, protein expression, half-  Cell theory lives  Evolution  Superhelicity  No obvious errors  Matches published data  Plot model predictions  Metabolite concentrations  Manually inspect data  DNA-bound protein density  Compare to known biology  Gene essentiality 38

  39. State of the art

  40. 40

  41. WC models provide novel insights v v 41

  42. WC models help design cells lacI 42

  43. WC models help purpose drugs 43

  44. Limitations of the Mycoplasma model • Represents one of the smallest bacteria • Ignores several processes • Mispredicts several phenotypes • Methods were ad hoc • Hard to understand, reuse, and expand • Time-consuming to build 44

  45. Toward more comprehensive and more accurate models

  46. Goal: design precise therapy 46

  47. Challenge: H1 hESCs • Karyotypically normal • Autonomous • Well-characterized 47

  48. Bottlenecks 48

  49. Bottlenecks • Data aggregation: Hard to find relevant data – Data is incomplete, scattered, and insufficient annotated • Model design: Hard to capture multiple scales and describe models modularly – Insufficient abstraction and metadata • Simulation : Hard to simulate multiple scales – Simulators are only support individual formalisms and are slow • Verification: Little formalism or standardization • Collaboration: Difficult to describe the data, assumptions, and decisions that underlie modeling 49

  50. Data needed for WC modeling substrate 𝑤 = 𝑙 cat [enzyme] substrate + 𝐿 m Metabolite concentrations Enzyme concentrations Reaction kinetics 50

  51. Data needed for WC modeling 51

  52. Datanator: data integration & discovery Find Aggregate Reduce Review • Species • Environment 7.3 10 -4 mM 52

  53. Datanator: data aggregation Metabolites Protein Interactions Pathways • ChEBI • COMPARTMENTS • BioCyc • KEGG • ECMDB, YMDB • CORUM • DBTBS • Pathway Commons • PubChem • Human Protein Ref. DB • DrugBank • Reactome • Pax-DB • JASPAR DNA • WikiPathways • PDB • KEGG • GenBank • PSORTdb • SuperTarget Rates • RESID RNA • BRENDA Taxonomy • UniProt • Array Express • SABIO-RK • NCBI • MODOMICS • RNALocate • RNA MOD 53

  54. Datanator: actionable metadata Measured Data generation process entity/property – Experimental design Measured value, – Measurement method uncertainty, units Data analysis process Genotype – Software – Taxon – Version – Genetic variant Metadata – Cell, tissue type – Authors Environment – Curator – Temperature – Date – pH – Citation – Growth media 54

  55. Datanator: Finding relevant data Chemical similarity – Tanimoto index – Sequence similarity Genetic similarity – Whole-genome similarity – Taxonomic distance Environmental similarity – Temperature – pH 55

  56. WC-Lang: scalable model descriptions • Concretely describe composite multi- algorithmic models • Concrete descriptions of every model element • Capture data and assumptions underlying models • Explicit descriptions of mixed granularity / lumping • Structured description of initial conditions • User interfaces suited to large models 57

  57. WC-Lang: scalable model descriptions RNA(i, 0) + NTP(i, 1)  RNA(i, 1) + PPi RNA(i, 1) + NTP(i, 2)  RNA(i, 2) + PPi RNA(i, 2) + NTP(i, 3)  RNA(i, 3) + PPi RNA(i, 3) + NTP(i, 4)  RNA(i, 4) + PPi … 58

  58. WC-Lang: scalable model descriptions RNA(i, l) + NTP(i, l+1)  RNA(i, l+1) + PPi RNA(I, l) + H 2 O  RNA(i, l-1) + NMP(i, l) Protein(i, l) + AA(i, l+1)  Protein(i, l+1) + H 2 O Protein(i, l) + H 2 O  Protein(i, l-1) + AA(i, l) 59

  59. WC-Lang: scalable model descriptions Initiation Elongation Termination SBML 1 per RNA 335 1 per base ~500k 1 per RNA 335 Rules 1 1 1 1 1 1 61

  60. WC-Sim: scalable co-simulation 64

  61. H1-hESC model H1 Composable Recon 2.2 model model • Kinetic data • H1 (SABIO-RK) transcriptomics • Protein data (ENCODE) abundance • Cell composition (Phanstiel et • Media al., 2011; composition PaxDB) 66

  62. Summary

  63. Availability • Code: code.karrlab.org (GitHub, PyPI) • Data: data.karrlab.org (Quilt) • Images: DockerHub • Primer and docs: docs.karrlab.org • Tutorials: sandbox.karrlab.org 68

  64. Summary Bioengineering and medicine needs WC models WC modeling is becoming feasible New technologies will enable WC modeling Pilot models will show the feasibility of bacteria and human models 69

  65. Tips & tricks 70

  66. Challenges to g2p2pop • Build models from imperfect data • Capture complexity within and between scales • Systematically link scales • Scalably simulate multiple scales • Collaborate

  67. Stretch goals inspire innovation 72

  68. Integration enables great scope and depth • Data aggregation • Model composition • Multi-algorithmic co-simulation • Modular methods and software • Interdisciplinary collaboration 73

  69. Frameworks enable scalable integration 75

  70. Common languages enable frameworks 76

  71. Agent-based modeling can capture complexity 77

  72. Collaboration enables solutions 78

  73. Modularity enables collaboration 79

  74. Sharing promotes collaboration • Quilt: data • GitHub: code • PyPI: packaged code • Docker: computing environments • Google Docs, Overleaf: written documents • Google Drive: other files • GitHub issues: tasks 80

  75. Common practices ease collaboration • Interfaces between modules • Coarse-graining • Package organization • Coding, documentation styles • Software libraries 81

Recommend


More recommend