08/03/16 WormBase ParaSite Team WormBase ParaSite Workshop Kevin Howe Bruce Bolt Jane Lomax Myriam Shafie WormBase Team BioinformaCcian BioinformaCcian BioinformaCcian Edinburgh (web and tools) (curaCon) (pipelines) Leader 9 th March 2016 Paul Kersey Ma< Berriman PI (at EMBL-EBI) PI (at Sanger InsCtute) parasite.wormbase.org The Website • Features both • Genome Browser nematodes • Transcriptomic Data Display (roundworms) and • Gene, transcript and protein informaCon pages platyhelminthes • ComparaCve Genomics (flatworms) genomes • Sequence Similarity Search (BLAST) • Variant Effect Predictor (VEP) * • No addiConal curaCon for most genomes • Advanced Search Tool (BioMart) • Release 5 • Access to BioMart data using R • Focus on rapid 2,070,948 genes 108 genomes • ProgrammaCc Access (REST API) availability of new data 99 species • Automated pipelines run * = Not covered today – speak to us for more informaCon over all genomes The Data • All genomes are shown “as supplied” by the submiber (except WormBase “core” genomes) • Varying levels of coverage and quality • Details of assembly and annotaCon displayed on informaCon page • “Core” parasiCc genomes: Brugia malayi, Onchocerca volvulus, Pris5onchus pacificus and Strongyloides ra; • Receive more care and abenCon • Community driven manual curaCon 1
08/03/16 Your Data WormBase and WormBase ParaSite • Publicly available transcriptomic data annotated and • wormbase.org is the home displayed on browser for highly curated data • Website supports ad-hoc visualisaCon of your own data (e.g. from C. elegans and other RNA-Seq alignments, variaCons) related nematodes • We welcome submissions of your own data to display on • Genes from “core” genome browser – allow readers of your papers to easily parasites also displayed visualise your data here • Please contact us (link at bobom of website) to discuss • More genomic data for requirements parasites available from parasite.wormbase.org This afernoon’s agenda… Afer this workshop… • 13:00 – 13:15 • Please contact us with any quesCons IntroducCon to WormBase ParaSite (contact form link at bobom of every page) • 13:15 – 13:45 • SoluCons to exercises on YouTube: Using the website (Part 1) parasite.wormbase.org/workshop • 13:45 – 14:15 Using the website (Part 2) • 14:15 – 15:00 Sequence Search with BLAST 15:00 – 15:30 • Coffee Break 15:30 – 16:30 • Data Mining with BioMart • 16:30 – 16:45 Bulk downloads and programmaCc access Workshop Feedback • Your feedback helps tailor future workshops Part 1: Using the website • We would be very grateful if you could complete this before leaving 2
08/03/16 Part 1: Summary 1. Front page 2. LocaCng genomes 1. Front page 3. NavigaCng genes, transcripts and scaffolds 4. RNASeq tracks 5. Adding your own data Front page Front page Front page Front page 3
08/03/16 Front page Front page Front page: find genomes 2. LocaCng genomes LocaCng genomes LocaCng genomes 4
08/03/16 Genomes list Genome pages Gene pages 3. NavigaCng genes, transcripts and scaffolds Gene pages: exons Gene pages: exons 5
08/03/16 GO terms Transcript pages: summary Transcript pages: navigaCng Transcript pages: protein domains NavigaCng: tabs LocaCon view: zooming 6
08/03/16 LocaCon view: gene/transcript info LocaCon view: jump to… LocaCon view: configure LocaCon view: export data Data tracks - RNASeq 4. RNASeq tracks 7
08/03/16 Data tracks - RNASeq 5. Adding your own data Adding your own data Adding your own data Adding your own data Part 1b: Browsing the website Searching the website ComparaCve genomics User accounts 8
08/03/16 Searching Search results Filtering search results ComparaCve Genomics IntroducCon Homology types • During each release, we compute • Orthologues: any gene pairwise relaCon phylogeneCc trees where the ancestor node is a speciaCon event • Every gene is included from 120 species: – 1-to-1 orthologue – 1-to-many orthologue – 99 helminths – Many-to-many orthologue – 9 free-living nematodes • Paralogues: any pairwise relaCon where the – 12 comparator species (e.g. human, mouse, etc) ancestor node is a duplicaCon event • Determine orthologues and paralogues 9
08/03/16 Understanding the gene tree Visual access to the trees Tabular access to tree data User Accounts User accounts User accounts • Saving and sharing abached data tracks • Saving configuraCon seongs • Saving and sharing BLAST results 10
08/03/16 User accounts: registering User accounts What is BLAST? • BLAST = B asic L ocal A lignment S earch T ool Part 2: Sequence Similarity • Sequence similarity tool • Allows comparison of a query sequence, Search using BLAST against a database of sequences • Query = your nucleoCde or protein sequence • Database = the genome or proteome of any species What is BLAST? Types of BLAST • Input: BLAST Type Query Sequence Target Database BLASTN Nucleotide Genome (nucleotide) NucleoCde or protein sequence BLASTP Peptide Proteome (peptide) Search Parameters BLASTX Six frame translation of a Proteome (peptide) nucleotide sequence TBLASTX (slowest) Six frame translation of a Six frame translation of • Output: nucleotide sequence genome List of all hits ranked in order of staCsCcal TBLASTN Peptide Six frame translation of genome significance 11
08/03/16 Using the ParaSite BLAST Using the ParaSite BLAST Defaults to the species you are currently browsing Using the ParaSite BLAST Using the ParaSite BLAST Making sense of the results Making sense of the results • Score Used to assess the biological relevance by describing the alignment quality Higher score = higher similarity • E -value Similar to (but not the same as) a p -value that has been corrected for mulCple tesCng - decreases exponenCally as the score increases Lower E -value = more significant result • %ID Percentage of your query sequence that matches the genome/proteome database 12
08/03/16 Data-mining with BioMart Part 4: Data-mining with BioMart Seong filters • SPECIES: Use this filter to select either individual genomes or nematode clades. – MulCple genomes can be selected by holding down the ctrl key or the opCon key on a Mac. • REGION: Restrict to a parCcular genomic region. • GENE: Specify a list of genes with WormBase – Should only be used where a single genome has been IDs, or one of the other ID types listed. selected, as it is possible that a parCcular region is – IDs should be separated by a new line. present in mulCple genomes. – If start/end co-ordinates are being specified, a scaffold or chromosome id is always required. – Where mulCple regions are specified, the format is 'Scaffold/Chr:Start:End:Strand' e.g. AG00032:411187:446321:1. – If no strand is specified, both strands are selected. – Regions should be separated by a comma or new line. 13
08/03/16 • PROTEIN DOMAINS: Allows you to restrict your • GENE ONTOLOGY: Restrict by one or more Gene query based on the presence or absence of Ontology (GO) terms for funcConal descripCons. protein domains. – Paste or upload a list of GO IDs or use the – Limit to genes... lets you choose a parCcular database autocomplete box to populate the list. feature set in include or exclude e.g. "restrict to all proteins containing any feature found in Pfam". • AlternaCvely restrict to a parCcular GO evidence – Limit to genes with these family or domain IDs: , type e.g. Inferred by Electronic AnnotaCon (IEA). allows you to restrict to one or more protein domains/families. – MulCple codes can be selected by holding down the – Accepts IDs from several databases including InterPro, ctrl key, or opCon key on a Mac. Pfam and Panther. IDs should be separated by a new line. BioMart output Seong Abributes (output): features Seong Abributes (output): Seong Abributes (output): structures homologues 14
08/03/16 Seong Abributes (output): sequence PracCcal exercises: part 1 1. In the SPECIES menu select Nippostrongylus 2. In the MULTI-SPECIES COMPARISONS menu select Orthologous C. elegans genes -> Only 3. Further refine this list by funcCon, process or locaCon by “I'd like to extract all C. elegans orthologs for choosing one or more categories from the GENE Nippostrongylus genes involved in a parCcular ONTOLOGY list. process.” – Start typing in the upper box and choose your terms of interest from the autocomplete, they will be added to the box beneath. 4. Click the Results bubon (top lef) to see your results. By default a two-column file is returned that contains gene ID and Genome Project. To configure different opCons for the output, select A<ributes in the lef menu. “I have a list of genes from Ascaris suum and would like to know which ones have orthologs in humans and mammals and which ones might be nematode-specific.” 15
Recommend
More recommend