deep computing in biology
play

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru - PowerPoint PPT Presentation

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center Thomas J. Watson Research Center ajayr@us.ibm.com IBM Computational Biology Center Outline Biology has become an Information Science Data


  1. Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center Thomas J. Watson Research Center ajayr@us.ibm.com

  2. IBM Computational Biology Center Outline � Biology has become an Information Science � Data explosion – how to take advantage � High Throughput technologies � Genomics � Genographic � Proteomics � Medical Imaging � Data Integration, Mining and Analysis � Scale of Computing is rapidly advancing – think big � Tackle Complexity in Biology – think multiscale 2

  3. IBM Computational Biology Center microRNAs in a nutshell microRNAs in a nutshell 3

  4. IBM Computational Biology Center rna22’ ’s Predictions s Predictions rna22 currently rna22 rna22 rna22 known predicted predicted predicted precursors Genome precursors 3' UTR targets affected (keys) (locks) Transcripts June 2005 114 623 > 6 0 ,0 0 0 9 ,7 5 2 C. elegans 78 1 ,1 1 7 > 1 5 0 ,0 0 0 1 3 ,1 0 4 D. melanogaster 245 4 4 ,3 5 8 > 4 0 0 ,0 0 0 1 8 ,5 9 7 M. musculus 321 5 0 ,1 3 9 > 5 5 0 ,0 0 0 2 3 ,6 1 6 H. sapiens Rigoutsos et al., Cell (2006) 4

  5. IBM Computational Biology Center The Genographic Project w w w .nationalgeographic.com/genographic 5

  6. IBM Computational Biology Center The Genographic Project Map of Human Migration 6

  7. IBM Computational Biology Center Public Participation Over 217,000 participants to date www.nationalgeographic.com/genographic www.ibm.com/dna Behar et al., PLoS Genetics , 3/e104 : 1083-1095 (2007) 7

  8. IBM Computational Biology Center m tDNA Report HVS1 Sequence � Haplogroup: M* � 16223T, 16519C � ATTCTAATTTAAACTATTCTCTGTTCTTTCATGGGGAAGCAGATTTGGGTA CCACCCAAGTATTGACTCACCCATCAACAACCGCTATGTATTTCGTACATT ACTGCCAGCCACCATGAATATTGTACGGTACCATAAATACTTGACCACCTG TAGTACATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAAG CAAGTACAGCAATCAACC T TCAACTATCACACATCAACTGCAACTCCAAAG CCACCCCTCACCCACTAGGATACCAACAAACCTACCCACCCTTAACAGTAC ATAGTACATAAAGCCATTTACCGTACATAGCACATTACAGTCAAATCCCTT CTCGTCCCCATGGATGACCCCCCTCAGATAGGGGTCCCTTGACCACCATCC TCCGTGAAATCAATATCCCGCACAAGAGTGCTACTCTCCTCGCTCCGGGCC CATAACACTTGGGGGTAGCTAAAGTGAACTGTATCCGACATCTGGTTCCTA CTTCAGGG C CATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACA TCACGATG 8

  9. IBM Computational Biology Center Why is Proteomics Important? � Proteins do real work in cells - not genes � Many disease involve post- translational modifications of proteins (hence, not encoded in genes) Mature Protein � Looking for protein-based Folded, modified, translocated biomarkers to track disease Cellular Machinery state or progression 9

  10. IBM Computational Biology Center Diagnostics w ith Proteomics Process of identification of protein fragments in the blood of an individual Process serum Identify peaks via in mass spec novel 2D analysis Extract blood IBM Extract raw data from subject Identification of markers characteristic of disease Healthy individuals Take serum Are these Medical condition from patient biomarkers of disease present? Analyze serum YES: patient peaks has condition NO: patient does not have condition Our algorithms The near future: Proteomics with allow for early Biomarkers of disease diagnostics of some conditions from blood samples 10

  11. IBM Computational Biology Center Medical Imaging: fMRI Listening to music Hubs analysis Activity analysis 11

  12. IBM Computational Biology Center Directional links explain the difference Neutral Links Visual Auditory Directed Links Visual Auditory Presented at the Human Brain Mapping Conference (2006) 12

  13. IBM Computational Biology Center Netw ork Analysis � Graphs determined by the structure of pairwise correlations between voxels display very robust topological statistical regularities, including power-law connectivity scaling and small-worldness * � However, the computations become intractable very easily as one moves up from two-point correlations � We developed a novel approach that extends our previous findings to include directional links, and based on this analyze the presence and significance of higher-order correlation patterns � We implemented a series of algorithms implemented on distributed platforms that render our approach feasible *Scale-Free Brain Functional Networks , V.M. Eguiluz, D.R. Chialvo. G.A. Cecchi, M. Baliki & V.A. Apkarian, Physical Review Letters 94 :18102 (2005) 13

  14. IBM Computational Biology Center Outline � Biology has become an Information Science � Data explosion – how to take advantage � High Throughput technologies � Genomics � Genographic � Proteomics � Medical Imaging � Data Integration, Mining and Analysis � Scale of Computing is rapidly advancing – think big � Molecular Simulations � Docking, Virtual Screening � Medical Imaging � Tackle Complexity in Biology – think multiscale 14

  15. Top Supercomputers in the World June 2007 Ven- Rmax Ven- Rmax # TFlops Installation # TFlops Installation dor dor Sandia NL DOE/NSSA/LLNL 1 IBM 280.6 11 Dell 53.00 (64 racks BlueGene/L) (Xeon/Infiniband) CEA/DAM Tera10 Oak Ridge NL Upgrade 12 Bull 52.84 2 Cray 101.7 (Itanium2) (XT3 Opteron) Sandia – Red Storm 13 SGI 51.87 NASA/Columbia (Itanium2) 3 Cray 101.4 (XT3 Opteron) NEC/ Tsubame Galaxy TiTech BlueGene at Watson 14 48.88 4 IBM 91.29 (Opteron/Clearspeed/IB) Sun (20 racks BlueGene/L) TACC – Lonestar BlueGene at Stony Brook / BNL Upgrade 15 Dell 46.73 New New 5 IBM 82.16 (Xeon/Infiniband) (18 racks BlueGene/L) Maui HPCC – Jaws ASC Purple LLNL Upgrade 16 Dell 42.39 6 IBM 75.76 (Xeon/Infiniband) (1526 nodes p5 575) New ARL New BlueGene at RPI Linux 17 40.61 7 IBM 73.03 Networx (DC Xeon 51xx/Infiniband) (16 racks BlueGene/L) NCSA FZJ – Juelich Upgrade 18 IBM 37.33 8 Dell 62.68 (QC Xeon/Infiniband) (8 racks BlueGene/L) BSC MareNostrum Japan Earth Simulator New 9 IBM 62.63 19 Appro 36.58 (2560 JS21 Blades) (DC Opteron/IB) Altix4700 at LRZ Japan Earth Simulator Upgrade 10 SGI 56.52 20 NEC 35.86 ( DC Itanium 2/Infiniband) (NEC) Source: www.top500.org 15

  16. IBM Computational Biology Center Time Scales: Biopolymers and Membranes Simulation Experiment 10 -15 10 -12 10 -9 10 -6 10 -3 10 3 10 6 10 9 1 s | | | | | | | | | Bond Vibration DNA Twisting Hinge Motion Helix-Coil Transition Protein Folding Lipid exchange via diffusion Ligand-Protein Binding Torsional correlation in lipid headgroups Electron Transfer Adapted from “The Protein Folding Problem”, Chan and Dill, Physics Today, Feb. 1993 16

  17. IBM Computational Biology Center Blue Matter strong scaling performance Computation rates as a function of atoms per node 1400 Hairpin SPI 64^3 (V5) SOPE SPI 64^$ (V5) Computation Rate (time-steps/second) Hairpin SPI 64^3 (V4) 1200 SOPE SPI (V5) Rhodopsin SPI (V5) SOPE SPI (V4) 1000 ApoA1 SPI (V5) Rhodopsin SPI (V4) ApoA1 SPI (V4) 800 SOPE MPI (V4) Rhodopsin MPI (V4) ApoA1 MPI (V4) 600 ApoA1 NAMD Msging Layer ApoA1 NAMD MPI 400 200 0 100 10 1 0.1 Atoms/Node www.research.ibm.com/bluegene 17

  18. IBM Computational Biology Center Lysozyme System � Trp62Ala mutation of Lyzosyme � Dramatically reduce stability in 8M urea solution � Responsible for amyloid formation � Lysozyme structure consists of: � ? -domain with � 4 alpha helixes (A-D) � 1 3 10 helix � ? -domain with � Anti parallel beta sheet � Loop of the beta-domain C. Dobson and coworkers, Nature 424, 783, 2003 18

  19. IBM Computational Biology Center R Zhou, M Eleftheriou, AK Royyuru, BJ Berne, Destruction of long-range interactions by a single mutation in lysozyme 19 Proc. Natl. Acad. Sci ., 104 :5824-5829 (2007)

  20. IBM Computational Biology Center GPCR-based drugs among the 200 best-selling prescriptions, and their GPCR targets GPCR target Drug Disease Company 2000 sales(US $m) Zantac AstraZeneca 870 Ulcers Pepcid Merck 850 Histamine receptors Claritin Schering-Plough 2,200 Allergies Allegra Aventia 1,100 Risperdal Psychosis Johnson & Johnson 1,600 Imitrex Migraine GlaxoSmithKline 1,100 5-HT receptors BuSpar Anxiety Bristol-Myers Squibb 714 Zyprexa Schizophrenia Eli Lilly 2,400 Angiotensin receptors Cozaar Merck 1,700 Hypertension Toprol-XL AstraZeneca 580 Adrenoceptors Coreg Congestive heart failure GlaxoSmithKline 250 Serevent Asthma GlaxoSmithKline 940 Atrovent COPD Boehringer 600 Muscarinic acetylcholine Ingelheim receptors Zoladex Cancer AstraZeneca 740 GnRH receptors Requip Parkinson’s diseases AstraZeneca 90 Dopamine receptors Cytotec Ulcers Pharmacia 100 Prostaglandin (PGE1) receptors Plavix Stroke Bristol-Myers Squibb 900 ADP receptors 20 http://www.predixpharm.com/market_table.htm

  21. IBM Computational Biology Center 3:1 SDPC/CHOL SOPE Toward Active Rhodopsin Pitman, M. C., Suits, F., Gawrisch, K. & Pitman, M. C., Suits, F., Feller, S. E. J Chem Phys 122 , 244715 (2005). Mackerell, A. D., Jr. & Feller, S. E. Biochemistry 43 , 15318-28 (2004). Suits, F., Pitman, M. C. & Feller, S. E. J Chem Phys 122 , 244714 (2005). Rhodopsin - Dark Ensemble Light-adapted Rhodopsin Pitman, M. C., Grossfield, A., Suits, F. & Feller, S. E. J. Am. Chem. Soc. 127 , 4576-4577 (2005). Pitman et al., PNAS (2005) Rhodopsin in 2:2:1 SDPC/SDPE/CHOL 21

Recommend


More recommend