big data in drug discovery
play

Big Data in Drug Discovery David J. Wild Assistant Professor & - PowerPoint PPT Presentation

Big Data in Drug Discovery David J. Wild Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing djwild@indiana.edu - http://djwild.info Epochs in drug discovery Empirical up until


  1. Big Data in Drug Discovery David J. Wild Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing djwild@indiana.edu - http://djwild.info

  2. Epochs in drug discovery Empirical – up until 1960’s 754 First pharmacy opened in Baghdad Late 1800’s – major pharmaceutical companies, mass production 1900-1960 – major discoveries (insulin, penicillin, the pill …) Rational – 1960’s to 1990’s Designing molecules to target protein active sites – “lock and key” Computational Drug Discovery Biggest success HIV (RT , protease inhibitors) Big Experiment – 1990’s to 2000’s High throughput screening Microarray Assays Gene Sequencing and Human Genome Project Big Data – 2010’s onwards Informatics-driven drug discovery Accepting the body is amazingly complex and we don’t understand it well Everything is connected Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  3. The metabolic pathways of a single cell Big Data in Drug Discovery David Wild, July 2010. http://djwild.info. David Wild, December 2009. Page 3 http://djwild.info

  4. The inner life of the cell http://video.google.com/videoplay?docid=-2351549868099343381&hl=en# Big Data in Drug Discovery David Wild, July 2010. http://djwild.info. David Wild, December 2009. Page 4 http://djwild.info

  5. Big Data in the public domain  There is now an incredibly rich resource of public information relating compounds, targets, genes, pathways, and diseases. Just for starters there is in the public domain information on:  69 million compounds and 449,392 bioassays (PubChem)  4,763 drugs (DrugBank)  9 million protein sequences (SwissProt) and 58,000 3D structures (PDB)  14 million human nucleotide sequences (EMBL)  19 million life science publications - 800,000 new each year (PubMed)  Multitude of other sets (drugs, toxicogenomics, chemogenomics, SAR, …)  Even more important are the relationships between these entities. For example a chemical compound can be linked to a gene or a protein target in a multitude of ways:  Biological assay with percent inhibition, IC50, etc  Crystal structure of ligand/protein complex  Co-occurrence in a paper abstract  Computational experiment (docking, predictive model)  Statistical relationship  System association (e.g. involved in same pathways cellular processes) Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  6. PubChem growth since 2005 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 60,000,000 70,000,000 80,000,000 http://djwild.info David Wild, December 2009. Page 6 0 2005-01 PubChem Substance Size 2005-2010 2005-03 2,824,265 2005-05 2005-07 2005-09 2005-11 2006-01 2006-03 2006-05 2006-07 2006-09 2006-11 2007-01 2007-03 ChemSpider Addition of 2007-05 2007-07 35,379,748 2007-09 2007-11 2008-01 2008-03 2008-05 2008-07 2008-09 2008-11 2009-01 Big Data in Drug Discovery David Wild, July 2010. http://djwild.info. 56,774,950 2009-03 2009-05 2009-07 2009-09 2009-11 69,088,100 2010-01 2010-03 2010-05 2010-07 1000000 100000 10000 1000 100 PubChem Bioassays 2005-2010 10 1 2005-01 2005-04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 ChEMBL Addition of 2008-04 2008-07 2008-10 2009-01 2009-04 2009-07 2009-10 434635 2010-01 2010-04 2010-07

  7. Large amount of data and links for each compound Big Data in Drug Discovery David Wild, July 2010. http://djwild.info. David Wild, December 2009. Page 7 http://djwild.info

  8. Proteins & Genes http://www.genome.jp/en/db_growth.html Big Data in Drug Discovery David Wild, July 2010. http://djwild.info. David Wild, December 2009. Page 8 http://djwild.info

  9. Chem2Bio2RDF: The FaceBook of Drug Discovery Big Data in Drug Discovery David Wild, July 2010. http://djwild.info. David Wild, December 2009. Page 9 http://djwild.info

  10. You are a big pile of data too! Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  11. Large-scale predictive modeling adds even more data Range of ROCV values from different classes of BioAssay data set. Range of ROCV values from three different classes of BioAssay data set for original models and models built with additional inactive compounds (“improved”). Chen, B. and Wild, D.J. PubChem BioAssays as a data source for predictive models, Journal of Molecular Graphics and Modeling . 2010; 28, 420-426. Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  12. Informatics-based drug discovery Predicting new molecular targets for known drugs. Nature 462, 175-181(12 November 2009) Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  13. “Systems chemical biology” and chemogenomics Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  14. Recent enabling technologies for SCB / Chemogenomics Analysis Visualization, Integrative projection, cheminformatics & data mining, Cloud computing bioinformatics hypothesis generation, allows processing connects and data mining on compounds, targets network tools a vast scale genes, pathways, diseases and side Integration effects RDF , XML, Triple Stores Semantic Ontologies, SPARQL, technologies and Health informatics Graph algorithms complex systems (PHRs and EHRs) tools allow allows integration seamless of the molecular integration and and patient models Access human-scale data (QP) mining Web Services, RPC Information extraction Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  15. ChemBioGrid.org: Web service infrastructure for cheminformatics Dong, X., Gilbert, K.E., Guha, R., Heiland, R., Kim, J., Pierce, M.E. Pierce, Fox, G.C. and Wild, D.J. Web service infrastructure for chemoinformatics, J. Chem. Inf. Model. , 2007; 47(4) pp 1303-1307. Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  16. The Semantic Web – meaning & relationships Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  17. Chem2Bio2RDF – RDF integration & SPARQL querying Chen, B., Dong. X., Jiao, D., Wang, H., Zhu, Q., Ding, Y ., Wild, D.J. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 2010, 11, 255 Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  18. Chem2Bio2RDF context Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  19. Chem2Bio2RDF Relationships Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  20. Linked Open Data Cloud (linkeddata.org) Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  21. Converting data into RDF Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  22. Finding multi-target inhibitors of MAPK pathway with a SPARQL query Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  23. Finding compounds with similar polypharmacology using SPARQL Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  24. Projecting queries into chemical space Plotting and GTM / MDS Dynamic embedding projection and unknown querying and embedding of projection into compounds all PubChem chemical space with SCB using clouds property labels Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  25. Projecting queries into chemical space Choi, J.Y . , Bae, S.H., Qiu, J., Fox, G., Chen, B., Wild. D.J. Browsing Large Scale Cheminformatics Data with Dimension Reduction. Emerging Computational Methods for the Life Sciences Workshop , ACM Symposium for High Performance Distributed Computing Jun 21-25, 2010, Chicago IL Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  26. “Doppler Radar Plot” – Kinase Specificity Choi, J.Y . , Bae, S.H., Qiu, J., Fox, G., Chen, B., Wild. D.J. Browsing Large Scale Cheminformatics Data with Dimension Reduction. Emerging Computational Methods for the Life Sciences Workshop , ACM Symposium for High Performance Distributed Computing Jun 21-25, 2010, Chicago IL Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  27. “Doppler Radar Plot” – Kinase Specificity Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  28. Chem2Bio2RDF Dashboard: finding paths Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  29. Pathfinder NFKB1 Glucocorticoid Receptor Triamcinalone Dexamethasone http://ella.slis.indiana.edu/~yuysun/flex/pathfinder.html Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  30. Dynamic exploration with clouds and Cytoscape Cytoscape plugins Virtuoso runs give access to Dynamic Chem2Bio2RDF Chem2Bio2RDF , exploration in queries on the LPG and chemical Cytoscape cloud structure visualization Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

  31. Hydrocortisone – Dexamethasone links  Fig. Use Case 1.Network diagram of the paths obtained between Hydrocortisone and Dexamethasone using ChemBioScape.Drugbank interaction contains information about every drug’s target. In this case, DB00741 and DB01234 share common targets through several different Drugbank interaction ID’s. Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Recommend


More recommend