data driven innovation
play

Data Driven Innovation Interoperability Tech Track (#agridata) 18 - PowerPoint PPT Presentation

Data Driven Innovation Interoperability Tech Track (#agridata) 18 & 19 March 2015, Wageningen (@rfinkers) Outline Introduction Interoperable Genetic Diversity Concept Bring Your Own Data party Aim BYOD Green


  1. Data Driven Innovation Interoperability Tech Track (#agridata) 18 & 19 March 2015, Wageningen (@rfinkers)

  2. Outline § Introduction “Interoperable Genetic Diversity” § Concept ”Bring Your Own Data” party § Aim BYOD Green Genetics? § Outcome BYOD Green Genetics § Hands on 2

  3. Climate change & Social disruption Photograph: ¡AFP/Getty ¡Images 4 http://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1

  4. Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 6

  5. Web based aggregation of Information 7

  6. Interoperable Genetic Diversity § Genebanks should utilize genomics data ● But should not store them! § Genomics studies should make variant data available ● But need access to passport and characterization & evaluation data. § Breeders needs tools to access diversity Genebank Genomics (s) provider(s) Finkers, ¡van ¡Hintum et ¡al . ¡2014 ¡DOI: ¡10.1017/S1479262114000689

  7. Intermezzo: Linked Open Data Standardization makes the information interoperable • Controlled vocabularies • Machine readable • Can all be queried by a single question vs. visiting many websites

  8. Interoperable Genetic Diversity (2) § Implications: ● Data can be stored at many different locations, but can be found by computers ● Newly published information (in the correct format) will be included automatically . ● Tools can be written to dedicated questions, such as assessing allelic variation or utilize for collection management Genebank Genomics (s) provider(s) Finkers, ¡van ¡Hintum et ¡al . ¡2014 ¡DOI: ¡10.1017/S1479262114000689

  9. Interdisciplinary Approach Needed Genomics Genebanks provider(s) 11

  10. Interdisciplinary Approach Needed Need for Data Scientists & Domain Experts Genomics Genebanks provider(s) 12

  11. Format: Bring your own Data Workshop 1. Users define the question(s) 2. Users and Linked data experts define concepts and ontologies 3. Experts help to create linked data and formulate query

  12. Bring Your Own Data Workshop Data Domain Trainers owners Experts Linked Data Experts n More Info: http://www.dtls.nl/fair-data/byod/ 14

  13. Example: Solanaceae Trait Ontology

  14. BYOD in action

  15. Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 17

  16. Example Query PREFIX'rdf:'<http://www.w3.org/1999/02/22:rdf:syntax:ns#> PREFIX'rdfs:'<http://www.w3.org/2000/01/rdf:schema#> PREFIX'taxon:'<http://openlifedata.org/taxonomy_resource:> PREFIX'tdwg:'<http://rs.tdwg.org/dwc/terms/> SELECT'?acc ?label'(str(?lat)'as'?latitude)'(str(?long)'as' ?longitude) GRAPH'<http://cgngenis.wageningenur.nl>'{ ?acc taxon:species ?species'. ?species'rdfs:label ?label'. ?acc tdwg:decimalLatitude ?lat . ?acc tdwg:decimalLongitude ?long } }'order'by'?label 18

  17. Outcome: Query Graph 19

  18. FAIRport * in VLPB? * More on FAIRport in the presentation of Luiz Bonino, Thursday 10:30

  19. Summary § Blueprint “Interoperable Genetic Diversity Shown” § BYOD resulted in interoperable data which could be queried ● Request your own BYOD? § Public <-> Private integration possible

  20. Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 22

  21. Select a genetically diverse collection Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information 23

  22. Working Prototype § screendump 24

  23. Questions? Acknowledgements: BYOD team Theo van Hinthum & Frank Menting (CGN) Denis Guryunov & Martijn van Kaauwen (prototype) et. all.

  24. HaploSmasher Hands On Session § HaploSmasher Prototype: ● genomic regions as input: SL2.40ch03:10000..10200 ● Solyc gene identifiers: Solyc10g085020 ● Filter SNPs on impact type ● HIGH, MODERATE, LOW, MODIFIER (SNPEff ) ● No input validation yet ● Use correct notation, existing Solyc gene ID’s

  25. HaploSmasher

  26. HaploSmasher § Query CGN FAIRdata graph ● Prototype is only generating links to CGN passport data now ● Graph data of three CGN accessions is available in our testset

  27. HaploSmasher examples: § Haplotype Output

  28. Example queries § http://www.plantbreeding.wur.nl/hs/ § Also, explore variation data & Linked resources ● http://www.tomatogenome.net § Examples: ● Beta-tubulin: Solyc10g085020 ● HIGH & MODERATE vs. ALL effects ● Glutamate dehydrogenase Solyc05g052100 ● Uridine kinase Solyc02g067880 ● magnesium chelatase Solyc04g015750 30

  29. HaploSmasher examples: § Conserved housekeeping genes: ● Beta-tubulin Solyc10g085020 439 AA ● 1 SNP (HIGH & MODERATE effect) , two haplotypes

  30. HaploSmasher examples: ● Beta-tubulin Solyc10g085020 439 AA ● 136 SNPs (all SNPEff impact types) ● Part of haplotype groups:

  31. HaploSmasher examples: ● Glutamate dehydrogenase Solyc05g052100 ● 13 SNPs (HIGH, MODERATE)

  32. HaploSmasher examples: ● Uridine kinase Solyc02g067880 ● 23 SNPs (HIGH, MODERATE) ● Example haplotype groups:

  33. HaploSmasher examples: ● magnesium chelatase Solyc04g015750 ● 21 SNPs (HIGH, MODERATE) ● Example haplotype groups:

Recommend


More recommend