gpu accelerated convolutional neural networks for protein
play

GPU-Accelerated Convolutional Neural Networks For Protein-Ligand - PowerPoint PPT Presentation

GPU-Accelerated Convolutional Neural Networks For Protein-Ligand Scoring David Koes @david_koes GPU Technology Conference May 8, 2017 University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT


  1. GPU-Accelerated Convolutional Neural Networks For Protein-Ligand Scoring David Koes @david_koes GPU Technology Conference May 8, 2017

  2. University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION NDA/BLA SUBMITTED IND SUBMITTED FDA APPROVAL NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2

  3. University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION If you stop failing so often you massively reduce the cost of drug development. NDA/BLA SUBMITTED — Sir Andrew Witty IND SUBMITTED FDA APPROVAL CEO, GlaxoSmithKline NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2

  4. University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION If you stop failing so often you massively reduce the cost of drug development. NDA/BLA SUBMITTED — Sir Andrew Witty IND SUBMITTED FDA APPROVAL CEO, GlaxoSmithKline NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2

  5. University of Pittsburgh Computational and Systems Biology 1. Does the compound do what you want it to? 2. Does the compound not do what you don’t want it to? 3. Is what you want it to do the right thing? 3

  6. University of Pittsburgh Computational and Systems Biology Protein Structures sequence → structure → function 4

  7. University of Pittsburgh Computational and Systems Biology Protein Structures sequence → structure → function 4

  8. University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, ? generalizes to new targets Requires molecular target with known structure and binding site 5

  9. University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site 5

  10. University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site 5

  11. University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction 6

  12. University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction 6

  13. University of Pittsburgh Computational and Systems Biology Protein-Ligand Scoring AutoDock Vina d r 1 r 2 O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry 31 (2010) 455-461 7

  14. University of Pittsburgh Computational and Systems Biology Can we do better? Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? 8

  15. University of Pittsburgh Computational and Systems Biology Can we do better? Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? Key Idea: Leverage “big data” • 231,655,275 bioactivities in PubChem • 125,526 structures in the PDB • 16,179 annotated complexes in PDBbind 8

  16. University of Pittsburgh Computational and Systems Biology Deep Learning 9

  17. University of Pittsburgh Computational and Systems Biology Deep Learning 9

  18. University of Pittsburgh Computational and Systems Biology Image Recognition Convolutional Neural Networks https://devblogs.nvidia.com 10

  19. University of Pittsburgh Computational and Systems Biology Convolutional Neural Networks . . . . Dog: 0.99 . . Cat: 0.02 Convolution Convolution Fully Connected Feature Maps Feature Maps Traditional NN 11

  20. University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring Pose Prediction CNN Binding Discrimination Affinity Prediction 12

  21. University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring Pose Prediction CNN Binding Discrimination Affinity Prediction 12

  22. University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring • Input representation Pose Prediction • Training CNN Binding Discrimination • Model optimization • Visualize and Evaluation Affinity Prediction 12

  23. University of Pittsburgh Computational and Systems Biology Protein-Ligand Representation (R,G,B) pixel G G C C O O R R G G G G G R R R R C C C C C O O O O G G G G G GR R R R C C C C C CO O O O G G G G G G R R C C C C C C O O G G G G C C C C G G G G C C C C G G G G C C C C G G G G C C C C 13

  24. University of Pittsburgh Computational and Systems Biology Protein-Ligand Representation (R,G,B) pixel → (Carbon, Nitrogen, Oxygen,…) voxel C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O The only parameters for this C C C C representation are the choice of C C C C grid resolution , atom density , C C C C and atom types . C C C C 13

  25. University of Pittsburgh Computational and Systems Biology Atom Density Gaussian 14

  26. University of Pittsburgh Computational and Systems Biology Atom Types Ligand Receptor AliphaticCarbonXSHydrophobe AliphaticCarbonXSHydrophobe AliphaticCarbonXSNonHydrophobe AliphaticCarbonXSNonHydrophobe AromaticCarbonXSHydrophobe AromaticCarbonXSHydrophobe AromaticCarbonXSNonHydrophobe AromaticCarbonXSNonHydrophobe Bromine Calcium Chlorine Iron Fluorine Magnesium Iodine Nitrogen Nitrogen NitrogenXSAcceptor NitrogenXSAcceptor NitrogenXSDonor NitrogenXSDonor NitrogenXSDonorAcceptor NitrogenXSDonorAcceptor OxygenXSAcceptor Oxygen OxygenXSDonorAcceptor OxygenXSAcceptor Phosphorus OxygenXSDonorAcceptor Sulfur Phosphorus Zinc Sulfur SulfurAcceptor 15

  27. University of Pittsburgh Computational and Systems Biology Training Data Pose Prediction 337 protein-ligand complexes 12,484 protein-ligand complexes • curated for electron density • diverse targets • diverse targets • wide range of affinities • <10µM affinity • generate poses with AutoDock Vina • generate poses with Vina • include minimized crystal pose - 745 <2Å RMSD (actives) - 24,727 <2Å RMSD (actives) - 3251 >4Å RMSD (decoys) - 244,192 >4Å RMSD (decoys) 16

  28. University of Pittsburgh Computational and Systems Biology Model Evaluation CSAR : >90% similar targets kept in same fold PDBbind : >80% similar targets kept in same fold AUC 17

  29. University of Pittsburgh Computational and Systems Biology Model Training Custom MolGridDataLayer Caffe Parallelize over atoms to obtain a mask of atoms that overlap each grid region Use exclusive scan to obtain a list of atom indices from the mask Parallelize over grid points , using reduced atom list to avoid O(N atoms ) check 18

  30. University of Pittsburgh Computational and Systems Biology Data Augmentation 19

  31. University of Pittsburgh Computational and Systems Biology Data Augmentation 19

  32. University of Pittsburgh Computational and Systems Biology Model Optimization Atom Types Pooling • Vina (34) • element-only (18) • ligand-protein (2) max Atom Density Type • Boolean • Gaussian Depth Radius Multiple Width Resolution Fully Connected Layers 20

  33. University of Pittsburgh Computational and Systems Biology Model Optimization 21

  34. University of Pittsburgh Computational and Systems Biology Model Optimization data 48^3 unit1_pool unit1_conv1 label 32 x 24^3 unit2_pool unit2_conv1 64 x 12^3 unit3_pool unit3_conv1 128 x 6^3 output_fc 2 output loss 21

  35. University of Pittsburgh Computational and Systems Biology Cross-Validation Evaluation 22

  36. University of Pittsburgh Computational and Systems Biology Pose Prediction (CSAR) 23

  37. University of Pittsburgh Computational and Systems Biology Pose Prediction (CSAR) inter -target ranking intra -target ranking 23

  38. University of Pittsburgh Computational and Systems Biology Pose Prediction (PDBbind) 24

  39. University of Pittsburgh Computational and Systems Biology Pose Prediction (PDBbind) inter -target ranking intra -target ranking 24

  40. University of Pittsburgh Computational and Systems Biology Visualization 25

Recommend


More recommend