Accelerating Virtual High-Throughput Ligand Docking Screening One Million Compounds Using a Petascale Supercomputer Sally R. Ellingson, PhD Candidate Department of Genome Science and Technology, University of Tennessee Center for Molecular Biophysics, UT/ORNL Advisor: Dr. Jerome Baudry 2012 Emerging Computational Methods for the Life Sciences Workshop (In Conjunction with HPDC12 Delft, Netherlands)
Outline • What is virtual molecular docking? • What is the importance of a virtual high- throughput screening? • Autodock4 and Autodock4.lga.MPI ▫ Implementation details ▫ Case study: million compound screen • What is the importance of multi-protein docking? ▫ Limitations with current screening software ▫ Future opportunities using Autodock Vina
What is virtual molecular docking? • Predicts conformation of a protein-ligand complex • Predicts binding affinity of the ligand to the protein Diller, D. J. and Merz, K. M. (2001), High throughput docking for library design and library prioritization. Proteins, 43: 113 – 124. (+) Reproduce correct bound conformation (+) Assign better scores to high-affinity ligands than to decoys (enrichment) (-) Generate scores that correlate with measured binding affinities
Why is virtual docking important in novel drug discovery? • Many medications act by binding and inhibiting a specific target • Early stage drug discovery consist of identifying ligands that bind to specific proteins with a high affinity and retain favorable pharmacological properties. http://www.chemistry-blog.com/2012/01/04/tedtalk-medicine-for-the-99-hes-about-99-wrong/
What is the importance of a virtual high-throughput screening? (A) (B) (A) Sally R. Ellingson and Jerome Baudry. High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud. In Proceedings of the second international workshop on Emerging computational methods for the life sciences (ECMLS '11). ACM, New York, NY, USA, 33-38. DOI=10.1145/1996023.1996028 http://doi.acm.org/10.1145/1996023.1996028. (B) Sally R. Ellingson, Sivanesan Dakshanamurthy, Milton Brown, Jeremy C. Smith, and Jerome Baudry. Accelerating Virtual High-Throughput Ligand Docking: Screening One Million Compounds Using a Petascale Supercomputer. Proceedings of the third international workshop on Emerging computational methods for the life sciences (ECMLS '12) (accepted)
Why is high-throughput virtual screening important in drug discovery? Virtual screenings: -Faster and more cost efficient -Allows larger search space of chemical compounds -Creates a wider, shorter funnel http://www.chemistry-blog.com/2012/01/04/tedtalk-medicine-for-the-99-hes-about-99-wrong/
Autodock4 http://autodock.scripps.edu/ Free, open source docking software developed at The Scripps Research Institute Conformational Search using Lamarckian Genetic Algorithm Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W. E., Belew, R. K. and Olson, A. J. (1998), Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem., 19: 1639 – 1662.
Autodock4 http://autodock.scripps.edu/ Free, open source docking software developed at The Scripps Research Institute Scoring of generated conformations Huey, R., Morris, G. M., Olson, A. J. and Goodsell, D. S. (2007), A semiempirical free energy force field with charge-based desolvation. J. Comput. Chem., 28: 1145 – 1152.
Autodock4 http://autodock.scripps.edu/ Free, open source docking software developed at The Scripps Research Institute Virtual Docking Process Precalculated Affinity Grids Receptor PDBQT Docking Log File AutoDock Ligand PDBQT Docking Parameter File This process must be done for every ligand in a high-throughput screening
Autodock4.lga.MPI A high-throughput virtual screening tool Goal -Develop a virtual screening tool that runs on high-performance supercomputers (MPI) Main Improvements for Virtual Screening -Separation of parameters associated with the screening and individual ligands -Concatenated binary grid files (HDF5) -Reduced output size Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. B. Collignon, R. Schulz, J.C. Smith and J. Baudry J. Comput. Chem. (2011) 32 (6): 1202 – 1209
Autodock4.lga.MPI maps.h5 A high-throughput virtual screening tool 19MB -53MB → 9.8MB-28MB using 196 CPUs Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. B. Collignon, R. Schulz, J.C. Smith and J. Baudry J. Comput. Chem. (2011) 32 (6): 1202 – 1209
Autodock4.lga.MPI A high-throughput virtual screening tool Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. B. Collignon, R. Schulz, J.C. Smith and J. Baudry J. Comput. Chem. (2011) 32 (6): 1202 – 1209
Million Compound Screening on a petascale supercomputer Predocking (file preparation) Workflow controlled by python scripts Runs on Lens (analysis cluster - Jaguar) Postdocking (analysis) TUTORIAL http://www.bio.utk.edu/baudrylab/autodockmpi.htm Sally R. Ellingson, Sivanesan Dakshanamurthy, Milton Brown, Jeremy C. Smith, and Jerome Baudry. Accelerating Virtual High-Throughput Ligand Docking: Screening One Million Compounds Using a Petascale Supercomputer. Proceedings of the third international workshop on Emerging computational methods for the life sciences (ECMLS '12) (accepted)
Million Compound Screening on a petascale supercomputer 180000 Million Compound Library 160000 140000 # of compounds 120000 100000 80000 65k processors 60000 40000 20000 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Rotatable Bonds (Degrees of Freedom) Sally R. Ellingson, Sivanesan Dakshanamurthy, Milton Brown, Jeremy C. Smith, and Jerome Baudry. Accelerating Virtual High-Throughput Ligand Docking: Screening One Million Compounds Using a Petascale Supercomputer. Proceedings of the third international workshop on Emerging computational methods for the life sciences (ECMLS '12) (accepted)
What is the importance of multi-protein docking? Many proteins of important function Drug Candidate Multi- protein docking Multi-protein docking: -Determine toxicity and side effects -Predict failures earlier in the process -Increase overall success rate http://www.chemistry-blog.com/2012/01/04/tedtalk-medicine-for-the-99- Also for many conformations of the same hes-about-99-wrong/ protein – to model receptor flexibility
Multi-protein docking and limitations with current screening software Many proteins of Autodock4.lga.MPI important function -Separate MPI jobs for each receptor -Binary grid files for each receptor Drug Candidate Multi- protein What is needed? docking A tool that allows an increase in the number of receptors used in a screening with a minimal increase in the amount of I/O per docking task All combinations Ligand PDBs Receptor PDBs Multi- protein screening
Autodock Vina Potential as docking engine for multi-protein screening • Scoring function: machine-learning approach • Conformational search: iterated local search global optimizer step mutation, local optimization, Metropolis acceptance criterion Average time in minutes per complex Autodock4 Autodock Vina 2-quad core processors Trott, O. and Olson, A. J. (2010), AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 31: 455 – 461. doi: 10.1002/jcc.21334
Autodock Vina Potential as docking engine for multi-protein screening • Calculates grid maps efficiently during docking and does not store them on disk • Result clustering and ranking details hidden (reduced output) • Limitations removed (i.e. maximum # of rotatable bonds) • Already multi-threaded (each docking potentially more efficient)
Summary • High-throughput molecular docking is an important tool to increase the cost and time efficiency of drug discovery • Current screening tool, Autodock4.lga.MPI, allows for a million compounds to be screened in less than 24 hours • Future development will focus on using multiple receptors
Acknowledgements • Genome Science and Technology, UT • Center for Molecular Biophysics, UT/ORNL ▫ Jeremy C. Smith • SCALE-IT, NSF/IGERT Scalable Computing and Leading Edge Innovative Technologies • National Center for Computational Sciences • Georgetown University • NIH-CTSA • ECMLS12 workshop organizers
Recommend
More recommend