EGI-InSPIRE Cheminformatics platform for drug discovery application Hsi-Kai, Wang Academic Sinica Grid Computing EGI User Forum, 13, April, 2011 1 www.egi.eu www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE RI-261323
• Introduction to drug discovery • Computing requirement of high throughput virtual screening • Cheminfomatics case study www.egi.eu EGI-InSPIRE RI-261323
Drug discovery development Computational chemistry /Molecular modeling useful across the pipeline, but very different techniques aim for success, but if not: fail early, fail cheap Ref: Makus R. and Ralph W. , Nature Rev. Drug Discov. (2003), 2 , 123-131 www.egi.eu EGI-InSPIRE RI-261323
Strategy in drug discovery Ligand unknown Ligand known Receptor Combichem Pharmacophore (3D structure) HTS Similarity unknown Virtual Screening QSAR Receptor Structure-based drug design Receptor-bases searching (3D structure) Receptor-ligand interaction De novo design known Docking 4 www.egi.eu EGI-InSPIRE RI-261323
Drug discovery on Grid (1/2) • What is grid • Many definitions exist in the literature • Foster and Kesselman, 1998. “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational facilities.” • Grid can provide • Large scale and on-demand resources • Computing resources (computing grids) • Storage resources (data grids) www.egi.eu EGI-InSPIRE RI-261323
Drug discovery on Grid (2/2) Problem – Millions of compounds and drugs molecules are presently available for screening – But developing efficient assay in laboratory for such a work is time-consuming and very expensive Molecular docking ….takes years chemical compounds Data challenge on Grid ….can be done in weeks In vitro Hits sorting screening Receptor structures and refining of best ~50 hits Solution – Grids offer high-speed computing and huge-data managing capability – Possible variant targets can be studied quickly by present modelling applications. – This will help medicinal chemists to respond to major instant threats. www.egi.eu EGI-InSPIRE RI-261323
GVSS , G AP V irtual S creening S ervice www.egi.eu EGI-InSPIRE RI-261323
GAP Service Architecture 8 www.egi.eu EGI-InSPIRE RI-261323
DIANE, DI stributed AN alysis E nvironment User Application Interface GRID environments •A lightweight framework for parallel scientific applications in master worker model, •The framework takes care of all synchronization, communication, and workflow management details on behalf of application www.egi.eu EGI-InSPIRE RI-261323
The profile of a DIANE job • Each horizontal line segment = one task = one docking • Unhealthy workers are removed from the worker list • Failed tasks are rescheduled to healthy workers good load balance the “bad” worker removed www.egi.eu EGI-InSPIRE RI-261323
Efficiency and throughput of DIANE • 280 DIANE worker agents were submitted as LCG jobs • 200 jobs (~71%) were healthy – ~16 % failures related to stable throughput middleware errors – ~12 % failures related to application errors DIANE utilizes ~ 95% of the healthy resources www.egi.eu EGI-InSPIRE RI-261323
GVSS application: dengue virus Ref: Hsin-Yen C. et al , J Grid Computing (2010), 8 , 529-541 www.egi.eu EGI-InSPIRE RI-261323
Worldwide dengue distribution Areas infested with Aedes aegypti Areas with Ae. aegypti and dengue epidemics Ref: Clark G.G. , "Dengue: An emerging arboviral disease“, 2006 www.egi.eu EGI-InSPIRE RI-261323
Dengue virus http://en.wikipedia.org/wiki/Aedes Kuhn, R.J. et al . Cell 108 , 717 −725; 2002 www.egi.eu EGI-InSPIRE RI-261323
Dengue NS3 protease H51 D75 S135 Ref: PDB: 2vbc (2008) J.Virol. 82: 173 www.egi.eu EGI-InSPIRE RI-261323
Dengue Fever Data Challenge / resources & 1 st result Total number of 300,000 completed docking jobs Estimated needed 4,167 computing power CPU*days Duration of the 60 days experiment Cumulative computing 42.5 GB results Total Computing 268 Cores Recourses in EUAsia VO Number of used 6 Computing Elements www.egi.eu EGI-InSPIRE RI-261323
Joint Computing Resources & Users • Accumulating Computing Recourses in EUAsia VO: 268 cpu- cores(100 – ASGC(TW), 2 – TH, 4 - VN, 18 – MIMOS(MY), 80 – UPM(MY), 64 - CESNET(CZ)) • lcg-infosites --vo euasia ce • Registered VQS account: • 6 users (TW) • 17 user (PH, 15 in AdMU, 2 in ASTI) • 2 user (TH, 1 in NECTEC, 1 in HAII) • 1 user (MY, UPM) • 1 user (ID, ITB) • 2 user (VN, IAMI) • 1 user (FR, HealthGrid) www.egi.eu EGI-InSPIRE RI-261323
Integration of SG & DG by EDGES 18 www.egi.eu EGI-InSPIRE RI-261323
Scenario 1 – DG to SG via bridge 19 www.egi.eu EGI-InSPIRE RI-261323
Scenario 2 – SG to DG via bridge 20 www.egi.eu EGI-InSPIRE RI-261323
Scenario 3 – SG/DG resources but not through EDGeS bridges Job Manager Task Manager 21 www.egi.eu EGI-InSPIRE RI-261323
Web UI Service Architecture 22 www.egi.eu EGI-InSPIRE RI-261323
Prototype Web UI Screenshot 23 www.egi.eu EGI-InSPIRE RI-261323
Simulation of drug discovery workflow Preparing ligand & protein Generating conformation Analyzing & ranking data Docking Scoring Ligand Protein 24 www.egi.eu EGI-InSPIRE RI-261323
Protein Database Ref: PDB, http://www.rcsb.org/pdb/home/home.do PDBbind, http://sw16.im.med.umich.edu/databases/pdbbind/index.jsp 25 www.egi.eu EGI-InSPIRE RI-261323
General class of docking algorithm • Genetic algorithm – is a search heuristic that mimics the process of natural evolution. It generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. – AutoDock , GOLD … • Molecular dynamics – is used to find poses by force-fields. The generated conformations usually consists of a simulated annealing to locate the global optimum in a large search space. – AMBER , CHARMM … • Shape complementarities – is a description of the molecules, including solvent-accessible surface area, geometric constraints, H-bond, hydrophobic/hydrophilic interaction between all atoms in the complex. – DOCK , FRED … www.egi.eu EGI-InSPIRE RI-261323
General class of scoring function • Force Field – affinities are estimated by intermolecular van der Waals, electrostatic interaction et al. between all atoms of the two molecules in the complex. – AMBER … • Empirical – count the number of interactions and assign a score based on the number of occurrences. Example H-bond, ionic, hydrophobic/hydrophilic interaction. – LUDI , X-Score … • Knowledge-base – observe known protein/ligand structures, and favor interactions and geometries that are seen often. – DrugScore , PMF … www.egi.eu EGI-InSPIRE RI-261323
Tools of docking and scoring Ref: AutoDock, http://autodock.scripps.edu/ X-SCORE, http://sw16.im.med.umich.edu/software/xtool/ 28 www.egi.eu EGI-InSPIRE RI-261323
Simulated Condition • Ligand and Protein • PDBBind database v2010 (3429 complexes) • Docking • software: AutoDock • computing time: 30 ~ 50 min per docking • ReScoring • software: X-Score • computing time: 1 ~ 2 min per scoring 29 www.egi.eu EGI-InSPIRE RI-261323
Free energy in AutoDock, X-Score www.egi.eu EGI-InSPIRE RI-261323
Free energy R 2 in ligand molecular weight www.egi.eu EGI-InSPIRE RI-261323
Free energy R 2 in protein enzyme type www.egi.eu EGI-InSPIRE RI-261323
RMSD in AutoDock, X-Score 33 www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
Future work • Finish implement Web-based Virtual Screening Service with EDGeS infrastructure. • The 691 proteins x 691 ligands docking tasks complete and data analysis. • Other proteins are classified by enzyme code. 36 www.egi.eu EGI-InSPIRE RI-261323
Thank you for your attention www.egi.eu EGI-InSPIRE RI-261323
Recommend
More recommend