GPU-Accelerated Convolutional Neural Networks For Protein-Ligand Scoring David Koes @david_koes GPU Technology Conference May 8, 2017
University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION NDA/BLA SUBMITTED IND SUBMITTED FDA APPROVAL NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2
University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION If you stop failing so often you massively reduce the cost of drug development. NDA/BLA SUBMITTED — Sir Andrew Witty IND SUBMITTED FDA APPROVAL CEO, GlaxoSmithKline NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2
University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS POST-APPROVAL BASIC DRUG PRE- FDA CLINICAL TRIALS RESEARCH & RESEARCH DISCOVERY CLINICAL REVIEW MONITORING PHASE I PHASE II PHASE III PHASE IV 1 FDA- APPROVED MEDICINE POTENTIAL NEW MEDICINES $2.6 BILLION If you stop failing so often you massively reduce the cost of drug development. NDA/BLA SUBMITTED — Sir Andrew Witty IND SUBMITTED FDA APPROVAL CEO, GlaxoSmithKline NUMBER OF VOLUNTEERS TENS HUNDREDS THOUSANDS Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org) 2
University of Pittsburgh Computational and Systems Biology 1. Does the compound do what you want it to? 2. Does the compound not do what you don’t want it to? 3. Is what you want it to do the right thing? 3
University of Pittsburgh Computational and Systems Biology Protein Structures sequence → structure → function 4
University of Pittsburgh Computational and Systems Biology Protein Structures sequence → structure → function 4
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, ? generalizes to new targets Requires molecular target with known structure and binding site 5
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site 5
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site 5
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction 6
University of Pittsburgh Computational and Systems Biology Structure Based Drug Design Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction 6
University of Pittsburgh Computational and Systems Biology Protein-Ligand Scoring AutoDock Vina d r 1 r 2 O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry 31 (2010) 455-461 7
University of Pittsburgh Computational and Systems Biology Can we do better? Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? 8
University of Pittsburgh Computational and Systems Biology Can we do better? Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? Key Idea: Leverage “big data” • 231,655,275 bioactivities in PubChem • 125,526 structures in the PDB • 16,179 annotated complexes in PDBbind 8
University of Pittsburgh Computational and Systems Biology Deep Learning 9
University of Pittsburgh Computational and Systems Biology Deep Learning 9
University of Pittsburgh Computational and Systems Biology Image Recognition Convolutional Neural Networks https://devblogs.nvidia.com 10
University of Pittsburgh Computational and Systems Biology Convolutional Neural Networks . . . . Dog: 0.99 . . Cat: 0.02 Convolution Convolution Fully Connected Feature Maps Feature Maps Traditional NN 11
University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring Pose Prediction CNN Binding Discrimination Affinity Prediction 12
University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring Pose Prediction CNN Binding Discrimination Affinity Prediction 12
University of Pittsburgh Computational and Systems Biology CNNs for Protein-Ligand Scoring • Input representation Pose Prediction • Training CNN Binding Discrimination • Model optimization • Visualize and Evaluation Affinity Prediction 12
University of Pittsburgh Computational and Systems Biology Protein-Ligand Representation (R,G,B) pixel G G C C O O R R G G G G G R R R R C C C C C O O O O G G G G G GR R R R C C C C C CO O O O G G G G G G R R C C C C C C O O G G G G C C C C G G G G C C C C G G G G C C C C G G G G C C C C 13
University of Pittsburgh Computational and Systems Biology Protein-Ligand Representation (R,G,B) pixel → (Carbon, Nitrogen, Oxygen,…) voxel C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O The only parameters for this C C C C representation are the choice of C C C C grid resolution , atom density , C C C C and atom types . C C C C 13
University of Pittsburgh Computational and Systems Biology Atom Density Gaussian 14
University of Pittsburgh Computational and Systems Biology Atom Types Ligand Receptor AliphaticCarbonXSHydrophobe AliphaticCarbonXSHydrophobe AliphaticCarbonXSNonHydrophobe AliphaticCarbonXSNonHydrophobe AromaticCarbonXSHydrophobe AromaticCarbonXSHydrophobe AromaticCarbonXSNonHydrophobe AromaticCarbonXSNonHydrophobe Bromine Calcium Chlorine Iron Fluorine Magnesium Iodine Nitrogen Nitrogen NitrogenXSAcceptor NitrogenXSAcceptor NitrogenXSDonor NitrogenXSDonor NitrogenXSDonorAcceptor NitrogenXSDonorAcceptor OxygenXSAcceptor Oxygen OxygenXSDonorAcceptor OxygenXSAcceptor Phosphorus OxygenXSDonorAcceptor Sulfur Phosphorus Zinc Sulfur SulfurAcceptor 15
University of Pittsburgh Computational and Systems Biology Training Data Pose Prediction 337 protein-ligand complexes 12,484 protein-ligand complexes • curated for electron density • diverse targets • diverse targets • wide range of affinities • <10µM affinity • generate poses with AutoDock Vina • generate poses with Vina • include minimized crystal pose - 745 <2Å RMSD (actives) - 24,727 <2Å RMSD (actives) - 3251 >4Å RMSD (decoys) - 244,192 >4Å RMSD (decoys) 16
University of Pittsburgh Computational and Systems Biology Model Evaluation CSAR : >90% similar targets kept in same fold PDBbind : >80% similar targets kept in same fold AUC 17
University of Pittsburgh Computational and Systems Biology Model Training Custom MolGridDataLayer Caffe Parallelize over atoms to obtain a mask of atoms that overlap each grid region Use exclusive scan to obtain a list of atom indices from the mask Parallelize over grid points , using reduced atom list to avoid O(N atoms ) check 18
University of Pittsburgh Computational and Systems Biology Data Augmentation 19
University of Pittsburgh Computational and Systems Biology Data Augmentation 19
University of Pittsburgh Computational and Systems Biology Model Optimization Atom Types Pooling • Vina (34) • element-only (18) • ligand-protein (2) max Atom Density Type • Boolean • Gaussian Depth Radius Multiple Width Resolution Fully Connected Layers 20
University of Pittsburgh Computational and Systems Biology Model Optimization 21
University of Pittsburgh Computational and Systems Biology Model Optimization data 48^3 unit1_pool unit1_conv1 label 32 x 24^3 unit2_pool unit2_conv1 64 x 12^3 unit3_pool unit3_conv1 128 x 6^3 output_fc 2 output loss 21
University of Pittsburgh Computational and Systems Biology Cross-Validation Evaluation 22
University of Pittsburgh Computational and Systems Biology Pose Prediction (CSAR) 23
University of Pittsburgh Computational and Systems Biology Pose Prediction (CSAR) inter -target ranking intra -target ranking 23
University of Pittsburgh Computational and Systems Biology Pose Prediction (PDBbind) 24
University of Pittsburgh Computational and Systems Biology Pose Prediction (PDBbind) inter -target ranking intra -target ranking 24
University of Pittsburgh Computational and Systems Biology Visualization 25
Recommend
More recommend