1
play

1 Types of Databases Entrez Nucleotides NCBI Field Guide NCBI Field - PDF document

The National Center for Biotechnology Information NCBI Field Guide NCBI Field Guide NCBI Molecular Biology Resources Bethesda,MD NCBI Databases Created in 1988 as a part of the National Library of Medicine at NIH Establish public


  1. The National Center for Biotechnology Information NCBI Field Guide NCBI Field Guide NCBI Molecular Biology Resources Bethesda,MD NCBI Databases Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases – Research in computational biology – Develop software tools for sequence analysis – Disseminate biomedical information March 2007 Web Ac Web Access: cess: www.ncbi.nlm.nih.gov NCBI Databases and Services NCBI Field Guide NCBI Field Guide • GenBank largest sequence database • Free public access to biomedical literature – PubMed free Medline – PubMed Central full text online access • Entrez integrated molecular and literature databases • BLAST highest volume sequence search service • VAST structure similarity searches • Software and Databases 1

  2. Types of Databases Entrez Nucleotides NCBI Field Guide NCBI Field Guide Primary • Primary Databases • GenBank / EMBL / DDBJ 86,766,287 – Original submissions by experimentalists Derivative – Content controlled by the submitter 1,715,255 • Examples: GenBank, SNP, GEO • RefSeq 5,312 • Derivative Databases • Third Party Annotation – Built from primary data 7,334 • PDB – Content controlled by third party (NCBI) • Examples: Refseq, TPA, RefSNP, UniGene, NCBI Total 88,494,392 Protein, Structure, Conserved Domain What is GenBank? International Sequence Database Collaboration NCBI’s Primary Sequence Database NCBI Field Guide NCBI Field Guide • Nucleotide only sequence database Entrez • Archival in nature NIH – Historical – Reflective of submitter point of view (subjective) NCBI – Redundant GenBank • GenBank Data •Submissions •Updates •Submissions – Direct submissions (traditional records) •Updates EMBL – Batch submissions (EST, GSS, STS) DDBJ – ftp accounts (genome data) EBI CIB • Three collaborating databases – GenBank NIG •Submissions – DNA Database of Japan (DDBJ) SRS •Updates – European Molecular Biology Laboratory (EMBL) EMBL getentry Database 2

  3. GenBank: NCBI’s Primary Sequence Database The Growth of GenBank NCBI Field Guide NCBI Field Guide Release 158 February 2007 Release 158 160 86,639,920 Records 140 157,335,689,977 Total Bases 120 WGS: 86.0 billion bases (billions) 100 Bases 263 Gigabytes (non-WGS) 1115 files (non-WGS) Doubling time 12-14 months 80 60 • full release every two months 40 Non-WGS: 71.3 billion bases • incremental updates daily 20 • available only via ftp 0 Aug-97 Aug-98 Aug-99 Aug-00 Aug-01 Aug-02 Aug-03 Aug-04 Aug-05 Aug-06 ftp://ftp.ncbi.nih.gov/genbank/ Organization of GenBank: Organization of GenBank: Traditional Divisions Bulk Divisions NCBI Field Guide NCBI Field Guide Records are divided into 18 Divisions. Records are divided into 18 Divisions. 12 Traditional 12 Traditional PRI Primate 6 Bulk 6 Bulk PLN Plant and Fungal BCT Bacterial and Archeal EST Expressed Sequence Tag INV Invertebrate GSS Genome Survey Sequence ROD Rodent HTG High Throughput Genomic Traditional Divisions: BULK Divisions: VRL Viral STS Sequence Tagged Site • Direct Submissions • Batch Submission VRT Other Vertebrate HTC High Throughput cDNA MAM Mammalian (Sequin and BankIt) PAT Patent (Email and FTP) • Accurate PHG Phage • Inaccurate SYN Synthetic (cloning vectors) • Well characterized • Poorly characterized ENV Environmental Samples UNA Unannotated Entrez query: gbdiv_xxx[Properties] Entrez query: gbdiv_xxx[Properties] 3

  4. A Traditional LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 GenBank Record Traditional GenBank Record DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 NCBI Field Guide NCBI Field Guide VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. Header TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) Accession REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission •Stable JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD ACCESSION U07418 20705, USA The Flatfile Format •Reportable REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. •Universal TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, VERSION U07418.1 GI:466461 USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. FEATURES Location/Qualifiers source 1..1931 /organism="Malus x domestica" Version /mol_type="mRNA" GI number /cultivar="'Law Rome'" /db_xref="taxon:3750" Tracks changes in sequence /tissue_type="peel" NCBI internal use gene 1..1931 Feature Table /gene="AFS1" CDS 54..1784 /gene="AFS1" /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO22848.2" /db_xref="GI:32265058" /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF well annotated EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg the sequence is the data Sequence 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt // Bulk Divisions GenBank Bulk Sequence: EST NCBI Field Guide NCBI Field Guide •Batch Submission and htg (email and ftp) •Inaccurate •Poorly Characterized • Expressed Sequence Tag – 1 st pass single read cDNA • Genome Survey Sequence – 1 st pass single read gDNA • High Throughput Genomic poorly – incomplete sequences of genomic clones characterized • Sequence Tagged Site – PCR-based mapping reagents 4

  5. ESTs in Entrez NCBI Field Guide NCBI Field Guide Total 41 million records Human 7.9 million Mouse 4.7 million Derivative Databases Cow 1.3 million Rice 1.2 million Zebrafish 1.2 million Maize 1.2 million Xenopus tropicalis 1.0 million Rat 0.9 million Wheat 0.9 million Chicken 0.6 million Barley 0.4 million Entrez Protein: Derivative Database GenPept: GenBank CDS translations NCBI Field Guide NCBI Field Guide Sequences Data Source FEATURES Location/Qualifiers 6,937,176 GenPept source 1..2484 /organism="Homo sapiens" 3,359,561 RefSeq /mol_type="mRNA" /db_xref="taxon:9606" 5,136 /chromosome="3" Third Party Annotation /map="3p22-p23" gene 1..2484 255,159 Swiss Prot >gi|463989|gb|AAC50285.1| DNA mismatch repair prote... /gene="MLH1" CDS 22..2292 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... 29,996 EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... PIR /gene="MLH1" /note="homolog of S. cerevisiae PMS1 (Swiss-Prot Accession Number P14242), S. cerevisiae MLH1 (GenBank Accession 12,079 PRF Number U07187), E. coli MUTL (Swiss-Prot Accession Number P23367), Salmonella typhimurium MUTL (Swiss-Prot Accession 91,116 PDB Number P14161) and Streptococcus pneumoniae (Swiss-Prot Accession Number P14160)" /codon_start=1 669,035 PAT Division /product="DNA mismatch repair protein homolog" /protein_id="AAC50285.1" 10,690,223 Total /db_xref="GI:463989" /translation="MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKS 4,545,310 TSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGE BLAST nr total ALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA (no patents or env) TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRS 5

Recommend


More recommend