Data Driven Innovation Interoperability Tech Track (#agridata) 18 & 19 March 2015, Wageningen (@rfinkers)
Outline § Introduction “Interoperable Genetic Diversity” § Concept ”Bring Your Own Data” party § Aim BYOD Green Genetics? § Outcome BYOD Green Genetics § Hands on 2
Climate change & Social disruption Photograph: ¡AFP/Getty ¡Images 4 http://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1
Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 6
Web based aggregation of Information 7
Interoperable Genetic Diversity § Genebanks should utilize genomics data ● But should not store them! § Genomics studies should make variant data available ● But need access to passport and characterization & evaluation data. § Breeders needs tools to access diversity Genebank Genomics (s) provider(s) Finkers, ¡van ¡Hintum et ¡al . ¡2014 ¡DOI: ¡10.1017/S1479262114000689
Intermezzo: Linked Open Data Standardization makes the information interoperable • Controlled vocabularies • Machine readable • Can all be queried by a single question vs. visiting many websites
Interoperable Genetic Diversity (2) § Implications: ● Data can be stored at many different locations, but can be found by computers ● Newly published information (in the correct format) will be included automatically . ● Tools can be written to dedicated questions, such as assessing allelic variation or utilize for collection management Genebank Genomics (s) provider(s) Finkers, ¡van ¡Hintum et ¡al . ¡2014 ¡DOI: ¡10.1017/S1479262114000689
Interdisciplinary Approach Needed Genomics Genebanks provider(s) 11
Interdisciplinary Approach Needed Need for Data Scientists & Domain Experts Genomics Genebanks provider(s) 12
Format: Bring your own Data Workshop 1. Users define the question(s) 2. Users and Linked data experts define concepts and ontologies 3. Experts help to create linked data and formulate query
Bring Your Own Data Workshop Data Domain Trainers owners Experts Linked Data Experts n More Info: http://www.dtls.nl/fair-data/byod/ 14
Example: Solanaceae Trait Ontology
BYOD in action
Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 17
Example Query PREFIX'rdf:'<http://www.w3.org/1999/02/22:rdf:syntax:ns#> PREFIX'rdfs:'<http://www.w3.org/2000/01/rdf:schema#> PREFIX'taxon:'<http://openlifedata.org/taxonomy_resource:> PREFIX'tdwg:'<http://rs.tdwg.org/dwc/terms/> SELECT'?acc ?label'(str(?lat)'as'?latitude)'(str(?long)'as' ?longitude) GRAPH'<http://cgngenis.wageningenur.nl>'{ ?acc taxon:species ?species'. ?species'rdfs:label ?label'. ?acc tdwg:decimalLatitude ?lat . ?acc tdwg:decimalLongitude ?long } }'order'by'?label 18
Outcome: Query Graph 19
FAIRport * in VLPB? * More on FAIRport in the presentation of Luiz Bonino, Thursday 10:30
Summary § Blueprint “Interoperable Genetic Diversity Shown” § BYOD resulted in interoperable data which could be queried ● Request your own BYOD? § Public <-> Private integration possible
Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 22
Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 23
Working Prototype § screendump 24
Questions? Acknowledgements: BYOD team Theo van Hinthum & Frank Menting (CGN) Denis Guryunov & Martijn van Kaauwen (prototype) et. all.
HaploSmasher Hands On Session § HaploSmasher Prototype: ● genomic regions as input: SL2.40ch03:10000..10200 ● Solyc gene identifiers: Solyc10g085020 ● Filter SNPs on impact type ● HIGH, MODERATE, LOW, MODIFIER (SNPEff ) ● No input validation yet ● Use correct notation, existing Solyc gene ID’s
HaploSmasher
HaploSmasher § Query CGN FAIRdata graph ● Prototype is only generating links to CGN passport data now ● Graph data of three CGN accessions is available in our testset
HaploSmasher examples: § Haplotype Output
Example queries § http://www.plantbreeding.wur.nl/hs/ § Also, explore variation data & Linked resources ● http://www.tomatogenome.net § Examples: ● Beta-tubulin: Solyc10g085020 ● HIGH & MODERATE vs. ALL effects ● Glutamate dehydrogenase Solyc05g052100 ● Uridine kinase Solyc02g067880 ● magnesium chelatase Solyc04g015750 30
HaploSmasher examples: § Conserved housekeeping genes: ● Beta-tubulin Solyc10g085020 439 AA ● 1 SNP (HIGH & MODERATE effect) , two haplotypes
HaploSmasher examples: ● Beta-tubulin Solyc10g085020 439 AA ● 136 SNPs (all SNPEff impact types) ● Part of haplotype groups:
HaploSmasher examples: ● Glutamate dehydrogenase Solyc05g052100 ● 13 SNPs (HIGH, MODERATE)
HaploSmasher examples: ● Uridine kinase Solyc02g067880 ● 23 SNPs (HIGH, MODERATE) ● Example haplotype groups:
HaploSmasher examples: ● magnesium chelatase Solyc04g015750 ● 21 SNPs (HIGH, MODERATE) ● Example haplotype groups:
Recommend
More recommend