computational challenges in computational challenges in
play

Computational Challenges in Computational Challenges in Genomics - PowerPoint PPT Presentation

Computational Challenges in Computational Challenges in Genomics and Molecular Biology Genomics and Molecular Biology Gene Myers Gene Myers VP, Informatics Research VP, Informatics Research Celera Genomics / Applied Biosystems Celera


  1. Computational Challenges in Computational Challenges in Genomics and Molecular Biology Genomics and Molecular Biology Gene Myers Gene Myers VP, Informatics Research VP, Informatics Research Celera Genomics / Applied Biosystems Celera Genomics / Applied Biosystems

  2. The Elements of Molecular Biology The Elements of Molecular Biology A principal goal is to understand cells and A principal goal is to understand cells and organisms as molecular systems / machines. The organisms as molecular systems / machines. The basic classes of molecules are: basic classes of molecules are: • DNA in the chromosomes of the genome contains all the in the chromosomes of the genome contains all the • DNA information to develop an organism and operate all its cell information to develop an organism and operate all its cell types. types. • RNA serves both short serves both short- -term informational roles and term informational roles and • RNA structural roles. structural roles. • Proteins execute the functions of a cell and provides its execute the functions of a cell and provides its • Proteins structural integrity. structural integrity. • Small metabolites (fats, sugars, etc.) provide energy, raw (fats, sugars, etc.) provide energy, raw • Small metabolites materials, and serve some limited structural roles. materials, and serve some limited structural roles.

  3. Cells As Molecular Machines Cells As Molecular Machines Cell Nucleus Genome Genome Polymerase Polymerase Gene Gene Transcription Transcription TBF TBF Splicing Splicing mRNA mRNA Metabolics: Metabolics: Synthesis Synthesis Transport Transport Degradation Degradation Signal Signal Energy Energy Ribosome Ribosome Cascade Cascade Translation Translation Protein Protein Activation Activation Receptor Receptor Secretion Secretion

  4. Understanding Cells at the Molecular Level Understanding Cells at the Molecular Level • • Determining the DNA sequences of the chromosomes of a species. Determining the DNA sequences of the chromosomes of a species. Sequencing Sequencing • • An accurate parts list of all the proteins and RNAs in the cell. An accurate parts list of all the proteins and RNAs in the cell. Annotation Annotation • • A graph of all the interactions taking place between these agents. s. A graph of all the interactions taking place between these agent Pathways Pathways • • What is happening during each interaction. What is happening during each interaction. Function Function • • Where each interaction is taking place. Where each interaction is taking place. Subcellular Localization Subcellular Localization

  5. Current State Current State We can sequence the euchromatic portions of genomes. We can sequence the euchromatic portions of genomes. � We can recognize 75% of the genes but not accurately unless they have have We can recognize 75% of the genes but not accurately unless they � been experimentally verified. We don’t know much about alternate e been experimentally verified. We don’t know much about alternat splicing. splicing. We can crudely observe expression of mRNAs and with even greater We can crudely observe expression of mRNAs and with even greater � difficulty observe the more abundant proteins. difficulty observe the more abundant proteins. Most accurate molecular biological information is still being verified one Most accurate molecular biological information is still being ve rified one � hypothesis at a time. hypothesis at a time. We must either coordinate efforts or reduce experimental costs to the point o the point We must either coordinate efforts or reduce experimental costs t � where each investigator is greatly empowered. where each investigator is greatly empowered.

  6. Current Technologies Current Technologies Sequencing: Randomly sample and sequence 600bp stretches from the ends of Randomly sample and sequence 600bp stretches from the ends of Sequencing: � segments of a given length and assemble, followed by a directed finishing phase. finishing phase. segments of a given length and assemble, followed by a directed Expression Assays: High density arrays where each spot is a set of 18 High density arrays where each spot is a set of 18- -50bp DNAs 50bp DNAs Expression Assays: � complementary to the RNA sequence to be measured, or geometric amplification mplification complementary to the RNA sequence to be measured, or geometric a from a pair of DNA probes complementary to the RNA sequence (quantitative ntitative from a pair of DNA probes complementary to the RNA sequence (qua PCR). PCR). Proteomics: Mass spectrometers can measure the amount and atomic weight of Mass spectrometers can measure the amount and atomic weight of Proteomics: � ionized protein pieces (peptides) allowing complex mixtures to be analyzed. ionized protein pieces (peptides) allowing complex mixtures to b e analyzed. Light Microscopy: With confocal microscopes and antibody, or RNA, or organo Light Microscopy: With confocal microscopes and antibody, or RNA, or organo- - � metallic staining, phenomenon involving but a few particles are being observed. being observed. metallic staining, phenomenon involving but a few particles are All of these technologies involve interesting problems in the interpretation of the terpretation of the All of these technologies involve interesting problems in the in � data. data. Data Analysis vs. Data Mining Data Analysis vs. Data Mining

  7. The Role of Informatics The Role of Informatics • • We need to make computers easier to program – We need to make computers easier to program – i.e. we need to put i.e. we need to put scientific computing in the hands of the scientists. scientific computing in the hands of the scientists. • • Our information management technologies are inadequate – – huge data huge data Our information management technologies are inadequate sets, semi- sets, semi -structured, data contains errors, not integrated structured, data contains errors, not integrated – – we need to we need to model these and develop flexible data mining capabilities over them. model these and develop flexible data mining capabilities over t hem. • • There will be a continued need for new algorithms and tools as driven by riven by There will be a continued need for new algorithms and tools as d new technologies and protocols. new technologies and protocols. • • Physical simulations systems of various types will be needed – – docking, docking, Physical simulations systems of various types will be needed ligand binding, stochastic differential equations. ligand binding, stochastic differential equations. • • Experimental design, driven by analysis and simulation, should be a part e a part Experimental design, driven by analysis and simulation, should b of our discipline and is an area where we can but are not contributing. buting. of our discipline and is an area where we can but are not contri

  8. A View of the Future A View of the Future • Data generation is outpacing Moore’s law by a large margin, but • Data generation is outpacing Moore’s law by a large margin, but most computations are trivially parallelizable. most computations are trivially parallelizable. • What will you do when a human genome can be sequenced in a • What will you do when a human genome can be sequenced in a couple of hours for $5,000? couple of hours for $5,000? • What can you do when protein structures can be routinely • What can you do when protein structures can be routinely determined at modest cost? determined at modest cost? • What will you do when nanotech methods exist for probing the cel • What will you do when nanotech methods exist for probing the cell l at the single molecule level? at the single molecule level? • The future will be shaped by technology development • The future will be shaped by technology development

Recommend


More recommend