nci doe cancer initiative ras biology in membranes
play

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level - PowerPoint PPT Presentation

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level Deep Learning (Towards Predictive Biology Through HPC) GTC 2017 Brian Van Essen Computer Scientist May 9, 2017 LLNL-PRES-730749 This work was performed under the auspices of


  1. NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level Deep Learning (Towards Predictive Biology Through HPC) GTC 2017 Brian Van Essen Computer Scientist May 9, 2017 LLNL-PRES-730749 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

  2. Cancer Moonshot Pilot 2 RAS activation Adaptive sampling molecular Predictive simulation experiments dynamics simulation codes and analysis of RAS (FNLCR) Adaptive time Adaptive spatial stepping resolution Experiments on nanodisc Phase field Coarse- Classical Phase Field model of model grain MD MD lipid membrane High-fidelity subgrid modeling Granular RAS membrane interaction Machine learning guided CryoEM simulations imaging dynamic validation X-ray/neutron scattering Atomic resolution RAS-RAF interaction Multi-modal experimental data, image reconstruction, analytics Mechanistic Unsupervised deep network models feature learning Protein structure databases RAS Activation Uncertainty quantification 2 LLNL-PRES-730749

  3. Molecular-level Deep Learning Goals Identify characteristics of: § Individual molecules — Hand engineered vs learned features § Collection of molecules (simulation frame) — Instantaneous state of the system § Progression of system over time — Identify / predict behavior Adapt simulation to explore state space: Can machine learned features identify and § Observe / analyze rare events highlight biologically interesting correlations? 3 LLNL-PRES-730749

  4. Molecular-level Deep Learning Techniques Use unsupervised learning to maximize labeled data § Convolutional autoencoders extract molecular-level features § Fully-connected autoencoders characterize state of simulation frame § Recurrent autoencoder predicts: — future events -- queue in-depth (expensive) analysis — state transitions -- progress simulation Data set characteristics: § Input dimensions: ~1.26e6 per time step (6000 lipids x 30 beads per lipid x (position + velocity + type)) § Sample size: O(10 6 ) for simulation requiring O(10 9 ) time steps 4 LLNL-PRES-730749

  5. RAS Monomer Simulations § Most MD studies of RAS have been in solution with no membrane § RAS only has biological activity when embedded in a membrane § NMR experiments have shown that RAS dynamics in membranes are complicated and are affected by the membrane composition and binding partners Inactive K-Ras binding GDP Active K-Ras binding GNP 5 LLNL-PRES-730749

  6. Overview: Molecular Dynamics (MD) § Represent every atom in a system § Describe the forces on all atoms: F = −∇ U ( r ) = m a = m !! r § Integrate: F = ma (millions of times) § Result: position of every atom as a function of time § Compare with experiments: structures/dynamics Current limitations § 100,000’s of atoms § 10,000’s of water molecules § 1,000’s of lipids § < 1 µs 6 LLNL-PRES-730749

  7. Coarse Grained Molecular Dynamics (CGMD) All atom CG § Merge several heavy atoms into a single “bead” § Describe bead-bead interactions with averaged force field — Sacrifice atomistic structural and dynamic information — Much less computer and time intensive — Same computational scaling properties DPPC lipid § 6 orders of magnitude increase in sampling! — 100s of μs* (+3 orders of magnitude) — 100,000s of lipids (+2 order of magnitude) *Actual “physiological’ timescale is even longer as there is also about a 10-fold increase in dynamics Protein α-helix 7 LLNL-PRES-730749

  8. Adaptive resolution MD/CGMD coupled with phase field § Model complex (many lipid) Phase Field Atomistic (MD) bilayer with phase field to capture structure and topology § Model Ras on membrane using full atomistic resolution § Use CGMD as ”glue” to connect different models Connecting MD and CGMD with continuum-scale phase field models will access biologically Coarse relevant time and length scales Grained (CGMD) 8 LLNL-PRES-730749

  9. Simulation of full system will incorporate a large number of smaller simulations § 10-100 µm lipid patches § Dynamic membrane § Thousands of Ras proteins — Mutant and wild-type — Many conformations — Many environments Investigate diffusion and aggregation in of Ras in context of specific membrane properties � (10 5 ) 100,000-atom simulations 9 LLNL-PRES-730749

  10. Simulations of KRAS have started in more biologically relevant lipid environments Distribution of lipids in average plasma membrane Completed coarse-grained (CG) simulations of Headgroups Tail unsaturation average mammalian plasma membrane with 63 distinct lipid types § Working on improving CG parameters for specific lipid types to be § consistent with all-atom (AA) simulations of lipids (LANL and LLNL) Outer leaflet Investigating “simple” average plasma membrane [only 18 lipid § types] Looking into tissue specific lipid compositions § Initial CGMD of KRAS proteins in complex human average plasma membrane 64 Kras4b in 70 nm x 70 nm membrane Inner leaflet § HVR in alpha helix conformation § Inserted in inner plasma membrane leaflet § Ingólfsson H.I., M.N. Melo, F. van Eerden, C. Arnarez, C.A. Lopez, T.A. Wassenaar, X. Periole, A.H. de Vries, D.P. Tieleman and S.J. Marrink. 2014. Lipid organization of the plasma membrane. J Am Chem Soc , 136:14554-14559 10 LLNL-PRES-730749

  11. KRAS4b in mammalian plasma membrane § 20,000 lipids (70x70 nm) § 40 µs pre-equilibration § 64 Ras proteins cluster readily § Associates with and aggregates charged lipids in the membrane Helgi Ingólfsson, LLNL 11 LLNL-PRES-730749

  12. Automated hypothesis generation and dynamic validation High dimensional model parameters High-fidelity simulation Hypothesis generation – use the ML model to predict CORAL computing architectures power the parameters for experimental dynamic validation loop data Ensembles of simulation Machine learning to train a reduced- [parameter|output] sets order predictive model 12 LLNL-PRES-730749

  13. Project will build understanding on computational advances Capability Time 13 LLNL-PRES-730749

  14. Applying Deep Learning to molecular-level simulations Challenges: Train neural networks on simulation data (not image slices) § Minimal prior art on deep neural networks trained on molecular dynamics § Labeling data is time consuming and requires domain experts § Approach Developing learned features that compliment standard molecular level features § Create an encoded representation that characterizes simulation state § Create model that can predict future simulation state § Questions Are these features useful for existing needs such as cluster detection § Can these encoding be used to queue domain scientists § ML provides data reduction and representation – how does this interface with traditional physics § 14 LLNL-PRES-730749

  15. Cluster Detection Cholesterol density Avg. Outer Inner Brain 15 LLNL-PRES-730749

  16. Cluster Detection Domain size(s) and dynamics? § Neighbor counting and clustering § Density maps - time and space correlation 1) x,y,z coordinates 2) Lipid type 3) Lipid area 4) Local bilayer height 5) Lipid order § Structure factor analysis 6) Lipid tilt 7) Lipid movement § Lipid feature selection for fancy clustering 8) Local density ... 16 LLNL-PRES-730749

  17. Cluster Detection Challenging Cases – Cluster boundaries are not well defined 17 LLNL-PRES-730749

  18. Cluster Detection Learn features for cluster detection and characterizing state § Use a multi-layer perceptron stacked auto- X' encoder to generate features that describe the Decoder state of a simulation frame § Generate automatically extracted features representing molecular simulation data Code z State of Frame § Establish framework for building future tools using learned features Encoder Expected outcome: X § Improvement in the understanding of protein formation and easing of the handling large-scale Molecular Molecular Features molecular dynamics output CNN 18 LLNL-PRES-730749

  19. Can we leveraging deep learning for static state? § Do learned features out perform hand selected features for cluster detection? § Do we have enough labeled data to learn complex representations? § Does the compressed frame representation provide good basis for representing MD simulation state? § Can we develop state descriptions that are meaningful to domain experts? 19 LLNL-PRES-730749

  20. State Transition Coupled Phase-Field Particle Model § Bilayer and water mapped to sheets with concentration and height fields Water RAS RAS § RAS mapped to particles as ”point” particles Bilayer 20 LLNL-PRES-730749

  21. State Transition Statistical Multi-scale Coupling Phase Field Statistical atoms Density, composition, and curvature consistent with the phase field Full dynamical atoms Accelerate particles with parallel replica dynamics 21 LLNL-PRES-730749

  22. State Transition Parallel Replica Dynamics A.F. Voter Phys. Rev. B, 57, R13985 (1998) Parallelizes time evolution Assumptions: - infrequent events - exponential distribution of first-escape times kt p ( t ) ke − = p(t) t 22 LLNL-PRES-730749

  23. State Transition Ensemble Multi-scale (Statistical Coupling) Phase Field Phase field parameters determined via atomistic MD Many 100k atom MD simulations 23 LLNL-PRES-730749

Recommend


More recommend