Deep Learning for Molecular Docking David Koes @david_koes GPU Technology Conference San Jose, CA March 26, 2018
University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION NDA/BLA SUBMITTED IND SUBMITTED FDA APPROVAL NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2
University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION If you stop failing so often you massively reduce the cost of drug development. NDA/BLA SUBMITTED — Sir Andrew Witty IND SUBMITTED FDA APPROVAL CEO, GlaxoSmithKline NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2
University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION If you stop failing so often you massively reduce the cost of drug development. NDA/BLA SUBMITTED — Sir Andrew Witty IND SUBMITTED FDA APPROVAL CEO, GlaxoSmithKline NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2
University of Pittsburgh Computational and Systems Biology 1. Does the compound do what you want it to? 2. Does the compound not do what you don’t want it to? 3. Is what you want it to do the right thing? 3
University of Pittsburgh Computational and Systems Biology Protein Structures sequence → structure → function 4
University of Pittsburgh Computational and Systems Biology Protein Structures sequence → structure → function 4
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, ? generalizes to new targets Requires molecular target with known structure and binding site 5
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site 5
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site 5
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction 6
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction 6
University of Pittsburgh Computational and Systems Biology Protein-Ligand Scoring AutoDock Vina d r 1 r 2 O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry 31 (2010) 455-461 7
University of Pittsburgh Computational and Systems Biology Can we do better? Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? 8
University of Pittsburgh Computational and Systems Biology Can we do better? Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? Key Idea: Leverage “big data” • 231,655,275 bioactivities in PubChem • 125,526 structures in the PDB • 16,179 annotated complexes in PDBbind 8
University of Pittsburgh Computational and Systems Biology Deep Learning Convolutional Neural Networks 9 https://devblogs.nvidia.com
University of Pittsburgh Computational and Systems Biology Deep Learning Convolutional Neural Networks 9 https://devblogs.nvidia.com
University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring Pose Prediction CNN Binding Discrimination Affinity Prediction 10
University of Pittsburgh Computational and Systems Biology Protein-Ligand Representation (R,G,B) pixel G G C C O O R R G G G G G R R R R C C C C C O O O O G G G G G GR R R R C C C C C CO O O O G G G G G G R R C C C C C C O O G G G G C C C C G G G G C C C C G G G G C C C C G G G G C C C C 11
University of Pittsburgh Computational and Systems Biology Protein-Ligand Representation (R,G,B) pixel → (Carbon, Nitrogen, Oxygen,…) voxel C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O The only parameters for this C C C C representation are the choice of C C C C grid resolution , atom density , C C C C and atom types . C C C C 11
University of Pittsburgh Computational and Systems Biology Training Data Pose Prediction Affinity Prediction • 8,688 low RMSD poses 4056 protein-ligand complexes • diverse targets • assign known affinity • wide range of affinities • regression problem • generate poses with AutoDock Vina • include minimized crystal pose 12
University of Pittsburgh Computational and Systems Biology Data Augmentation ≠ 13
University of Pittsburgh Computational and Systems Biology Data Augmentation ≠ 13
University of Pittsburgh 48x48x48x35 2x2 Max Pooling 24x24x24x35 3x3x3 Convolution Rectified Linear Unit Model 24x24x24x32 2x2 Max Pooling 12x12x12x32 3x3x3 Convolution Rectified Linear Unit 12x12x12x64 2x2 Max Pooling 6x6x6x64 3x3x3 Convolution Rectified Linear Unit Computational and Systems Biology 6x6x6x128 Fully Connected Fully Connected Softmax+Logistic Loss Pseudo-Huber Loss Affinity Score Pose 14
University of Pittsburgh Computational and Systems Biology Results Trained on PDBbind refined; tested on CSAR 15
University of Pittsburgh Computational and Systems Biology Results Trained on PDBbind refined; tested on CSAR 15
University of Pittsburgh Computational and Systems Biology Results Clustered Cross-Validation RMSE = 1.69 R = 0.57 AUC = 0.90 Trained on PDBbind refined; tested on CSAR 15
University of Pittsburgh Computational and Systems Biology Visualization masking gradients layer-wise relevance 1UGX Score: 0.62 16
University of Pittsburgh Computational and Systems Biology Visualizing Empty Space 17
University of Pittsburgh Computational and Systems Biology Beyond Scoring 18
University of Pittsburgh Computational and Systems Biology Beyond Scoring 18
University of Pittsburgh Computational and Systems Biology Beyond Scoring 18
University of Pittsburgh Computational and Systems Biology Beyond Scoring Deep Dreams https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html 18
University of Pittsburgh Computational and Systems Biology Beyond Scoring 2Q89 Less Oxygen Here More Oxygen Here 19
University of Pittsburgh Computational and Systems Biology Beyond Scoring 2Q89 Less Oxygen Here More Oxygen Here 19
University of Pittsburgh Computational and Systems Biology Optimizing Low RMSD Poses better worse 21
University of Pittsburgh Computational and Systems Biology Iterative Refinement better worse 22
University of Pittsburgh Computational and Systems Biology Iterative Refinement better worse 22
University of Pittsburgh Computational and Systems Biology Docking vina/smina/gnina Sampling Refinement Rescoring MCMC CNN MCMC Vina pose affinity MCMC MCMC best poses MCMC … CNN N (50) independent Monte Carlo chains Scored with grid-accelerated Vina Best identified pose retained 23
University of Pittsburgh Computational and Systems Biology Full CNN Docking 24
University of Pittsburgh Computational and Systems Biology GPU Performance Atom Gradients 500 CNN Backward CNN Forward 375 Molecular Grid Average Time (ms) 250 125 0 Xeon 4110 2.1GHz i9-7920X 2.9Ghz GTX 1070 Ti V100 25
Prospective Evaluation: D3R
University of Pittsburgh Computational and Systems Biology Grand Challenge 3 Spearman Correlation cnn_docked_affinity cnn_rescore_affinity cnn_docked_scoring cnn_rescore_scoring vina cat 0.0701 0.154 -0.0351 0.178 0.179 p38a -0.0784 -0.116 -0.329 -0.305 -0.0631 vegfr2 0.366 0.484 0.434 0.448 0.414 jak2 0.428 0.338 0.39 0.27 0.106 jak2_sub3 0.68 0.369 -0.372 0.159 -0.633 tie2 0.648 0.835 0.136 -0.078 0.561 abl1 0.634 0.745 0.005 0.182 0.713 27
University of Pittsburgh Computational and Systems Biology Grand Challenge 3: The Good 28
Recommend
More recommend