21 ‐ Mar ‐ 15 Info and documentation Introduction to Bioinformatics • http://theory.bio.uu.nl/BDA/2015 • http://www.google.com – … but only for guidance and hints: never take the internet for granted • Campbell Biology, 9 th or 10 th edition, Pearson p gy, , • Reader – Printed in black and white – Download full color PDF at: http://theory.bio.uu.nl/BDA/2015/BioInf2015.pdf Bas E. Dutilh – Errata: Systems Biology: Bioinformatic Data Analysis http://theory.bio.uu.nl/BDA/2015/errata.html Utrecht University, March 19 th 2015 Evaluation How would you figure out the function of a protein? • Final mark course – 2/3 mark of Mathematics/Theoretical Biology – 1/3 mark of Bioinformatic Data Analysis • Bioinformatics: mark of written exam only – NOTE: this is different from info in studiegids! Activity assay – Date: April 9 th 2015 at 17:00 ‐ 20:00 in Educatorium Gamma X ‐ ray structure • Bonus point – NOTE: this is different from info in studiegids! – Make all practicals and have them signed by the assistant • In case of emergencies you can be late by one class maximum – Hand in your mini ‐ article on time (deadline: April 7 th 2015) Knock ‐ out mouse through http://theory.bio.uu.nl/sb/rooster.html – The bonus point will only be added to the mark of the written exam if this mark is >4 before addition – The maximum mark is a 10 BLAST search How about for all proteins in a genome? Genome sizes Chaos chaos (1.4 Tb, Friz 1968) Tb: Tera base pairs (10 12 ) Gb: Giga base pairs (10 9 ) Mb: Mega base pairs (10 6 ) Kb: Kilo base pairs (10 3 ) 1
21 ‐ Mar ‐ 15 Gene density and non ‐ coding DNA Components of the human genome • Mammals (including humans) have the lowest gene • 20,000 – 25,000 protein ‐ coding genes (1.5%) density – Number of genes in a given length of DNA • Introns within genes • Introns (25.9%) • Noncoding DNA between genes • Transposable elements (44.7%) – DNA transposons – Long terminal repeat (LTR) retrotransposons – Short interspersed nuclear elements (SINEs) – Long interspersed nuclear elements (LINEs) – Endogenous retroviruses – Miniature inverted repeat transposable elements (MITEs) Smallest genomes Largest genomes • Eukaryota – Free: Ostreococcus tauri (12.6 Mb) – Endosymb: Encephalitozoon intestinalis (2.3 Mb) • Bacteria and Archaea – Free: Mycoplasma genitalium (580 kb) Largest sequenced genome: – Endosymb: Cand. Carsonella ruddii (160 kb) Loblolly pine ( Pinus taeda ) 20,000,000,000 bp (20 Gb) 20 000 000 000 b (20 Gb) • Viruses Kinugasas ō ( Paris japonica ) 149,000,000,000 bp (149 Gb) – Circoviridae (1.8 kb – only two proteins!) Genetic diversity Human genome • Phylogenetic Tree of Life • 3,000,000,000 bp (3 Gb) • Human Genome Project (HGP) – 1990 ‐ 2003 – Draft genome sequence complete in 2000 Eukaryotes • Reference genome – Source: blood (female) and sperm (male) – Samples taken from many donors, but only a l k f d b l few were used to protect donor identities – Sequence is not from one individual • >70% from one male donor Archaea • Cost HGP: $ 3,000,000,000 Prokaryotes – Target: $ 1,000 genome Bacteria 2
21 ‐ Mar ‐ 15 Genome sequencing Whole Genome Shotgun (WGS) approach Cloned genomes Segments known order Fragment and sequence Assemble sequences Consensus genome Personal genome sequences Your personal genome sequence ~2.000.000 differences Craig Venter Craig Venter James Watson James Watson ~5.000.000 differences ~5.000.000 differences Reference Genome So we have a $200 personal genome… Personalized medicine Sergey Brin Sergey Brin Co ‐ founder Co ‐ founder Co ‐ invester Co ‐ invester LRRK2 polymorphism on chromosome 12 ‐ 28% risk of Parkinson’s at age 59 ‐ 51% at age 69 • …now the million dollar question is: ‐ 74% at age 79 • From reactive to proactive medicine From reactive to proactive medicine What can I learn from my – Identify high risk alleles 3,000,000,000 A’s, C’s, G’s, and T’s? – Adapt lifestyle (e.g. risk of high blood pressure) – Preventive screening or treatment (e.g. risk of cancer) • Pharmacogenomics: – Impact of genetic variation on response to medication 3
21 ‐ Mar ‐ 15 Biology is Big Data science Omics sciences • The suffix ‐ ome refers to a totality of some sort • Gene (genetics) • Genome • Genomics # sequenced genomes • Transcript (RNA) • Transcriptome • Transcriptomics • Protein • Proteome • Proteomics DNA RNA Protein • Metabolite • Metabolome • Metabolomics • Lipid • Lipidome • Lipidomics Moore's Law: computer • Microbe • Microbiome • Microbiomics (?!) power doubles every ~2 years. Genomics Metagenomics • Identify differences in gene content between genomes Sample • Discover new species: “Biological Dark Matter” • Analyze genome evolution • Predict gene functions Filter Filter Microbes or viruses Chordata ↔ Echinodermata Bioinformatics Human microbiome and virome • In your body: ~10 13 human cells ~10 14 bacteria ~10 15 viruses • Bioinformatics: study of informatic processes in biotic systems Image: Lisa Brown for Paulien Hogeweg and Ben Hesper (Utrecht University, 1970) • Bioinformatic Data Analysis: using computational methods to analyze biological data 4
21 ‐ Mar ‐ 15 Bioinformatics in Utrecht today 5
Recommend
More recommend