bioinformatics databases
play

Bioinformatics Databases Introduction to Bioinformatics Dortmund, - PowerPoint PPT Presentation

Bioinformatics Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview Databases at NCBI (via Entrez) DNA GenBank, EMBL, DDBJ Data Format


  1. Bioinformatics Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1

  2. Overview ● Databases at NCBI (via Entrez) ● DNA – GenBank, EMBL, DDBJ – Data Format Issues ● UCSC Genome Browser ● Protein – SwissProt, PIR, PDB ● Sequence Retrieval System at EBI 2

  3. Fundamentals ● Accession number := – unique identifier for each entry (“record”) in a DB – Example: PubMed ID [PMID] – If you know the accession number, you obtain the record without searching – Different databases can be linked via accession numbers – Data integration: Hide the details (accession numbers) behind a convenient interface 3

  4. Databases at NCBI (2007) http://www.ncbi.nlm.nih.gov/ 4

  5. Different Databases ● DNA – nucleotide sequence – gene – transcript / gene expression – genome ● Protein – sequence and annotation – structure ● ... 5

  6. Different Databases ● Repositories of primary sequence data – Everything related to a topic goes in here – GenBank (NCBI Nucleotide): all nucleotide seq's ● Machine-curated annotation data – automatically generated from primary data – quality depends on primary data and method ● Manually curated annotation data – reviewed by experts (SwissProt – Amos Bairoch) – high quality, slow to grow 6

  7. Integration ● “Meta Search Engines” – Entrez at NCBI (U.S.) – SRS at EBI (Europe) ● Value comes from linking databases ● Accession numbers provide unique identifiers 7

  8. Security ● Assume that everything you send over the internet can be intercepted. ● Don't send confidential data, patent data, etc. ● None of the public databases currently supports encryption 8

  9. Searching Entrez 9

  10. Nucleotide Results 10

  11. Core Nucleotide DB 11

  12. DNA / Nucleotide DBs ● International Nucleotide Sequence Database Collaboration (INSDC) same content GenBank = NCBI Nucleotide 12

  13. File Formats: GenBank LOCUS AAURRA 118 bp ss-rRNA RNA 16-JUN-1986 DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA. ACCESSION K03160 VERSION K03160.1 GI:173593 KEYWORDS 5S ribosomal RNA; ribosomal RNA. SOURCE A.auricula-judae (mushroom) ribosomal RNA. ORGANISM Auricularia auricula-judae Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes; Heterobasidiomycetidae; Auriculariales; Auriculariaceae. REFERENCE 1 (bases 1 to 118) AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R. TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and their use in studying the phylogenetic position of basidiomycetes among the eukaryotes JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983) FEATURES Location/Qualifiers rRNA 1..118 /note="5S ribosomal RNA" BASE COUNT 27 a 34 c 34 g 23 t ORIGIN 5' end of mature rRNA. 1 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga 61 gtaccgccca gttagtacca cggtggggga ccacgcggga atcctgggtg ctgtggtt // LOCUS ABCRRAA 118 bp ss-rRNA RNA 15-SEP-1990 ... 13

  14. File Formats: FASTA >gi|173593|gb|K03160.1|AAURRA Auricula auricula-judae 5S ribosomal RNA ATCCACGGCCATAGGACTCTGAAAGCACTGCATCCCGTCCGATCTGCAA AGTTAACCAGAGTACCGCCCAGTTAGTACCACGGTGGGGGACCACGCG GGAATCCTGGGTGCTGTGGTT 14

  15. Sequence Retrieval System (SRS) ● URL: http://srs.ebi.ac.uk/ 15

  16. Selecting Libraries (DBs) to Search 16

  17. Standard Query Form 17

  18. UCSC Genome Browser ● Portal to ENCODE: Encyclopedia of DNA elements functional annotation of the human genome 18

  19. Protein: UniProt / SwissProt ● URL: http://expasy.org/sprot/ – SwissProt: manually curated – TrEMBL: anntotated automatically 19

  20. Protein Structure: (WW)PDB ● http://www.wwpdb.org/ 20

Recommend


More recommend