in the beginning
play

In The Beginning Data. Lots of it. eg. VCF, BAM files In The - PowerPoint PPT Presentation

In The Beginning Data. Lots of it. eg. VCF, BAM files In The Beginning Goal. Build a web-based interface on top of a fast backend to help navigate and explore the data esv Origin: Prototype Origin: Challenges Linking : All views should be


  1. In The Beginning Data. Lots of it. eg. VCF, BAM files

  2. In The Beginning Goal. Build a web-based interface on top of a fast backend to help navigate and explore the data esv

  3. Origin: Prototype

  4. Origin: Challenges Linking : All views should be interactive

  5. Origin: Challenges Scalability : Creating, editing, and linking should be fast to drive data discovery

  6. Origin: Challenges Interface : Exploring data should be natural, informative, and easy to follow

  7. Structures View Visual Representation View Filter on genomic positions, genes Data Filter on data parameters (eg. threshold, experiment type) Data Underlying data source (ie. by sample ID, project)

  8. Progress Major Highlights ● Redesigned interface and editor ● New query engine ● Improved views / visualizations to support linking and interaction ● Supertable ● Data denormalization contributions

  9. Live Demo ESV Demonstration

  10. Data Denormalization: Why? ● ElasticSearch is an extremely fast text-search engine - but it is schema-free ○ No set column names, no defined structure ● How do we find relations then?

  11. Data Denormalization: Why? TITAN Dataset How do we know which mutations fall within which copy number alteration given a given genomic coordinate? Mutationseq Dataset

  12. Data Denormalization: How? MutationSeq TITAN sample id: DG1155 sample id: DG1155 chrom: 01 chrom: 01 position: 104,589 start: 103,062 ref_allele: A end: 109,114 alt_allele: T state: GAIN probability: 0.91 ... ...

  13. Data Denormalization: How? MutationSeq TITAN sample id: DG1155 sample id: DG1155 chrom: 01 chrom: 01 position: 104,589 start: 103,062 ref_allele: A end: 109,114 alt_allele: T state: GAIN probability: 0.91 events: {...} ...

  14. Data Denormalization: How? ● Unlike Facebook or Twitter, our Mutationseq data is mainly static sample id: DG1155 ● Exploit ElasticSearch’s very fast chrom: 01 query term search position: 104,589 ref_allele: A ● Ask questions like: Find me all the alt_allele: T TITAN segments that overlap a probability: 0.91 particular MutationSeq event events: { chrom: 01 start: 103,062 end: 109,114 state: GAIN ... }

  15. Data Denormalization: Result

  16. To Infinity and Beyond ● Applications to other areas of research and/or industry in the future, as ESV was designed to be as general as possible ● Addition of new datasets/datatypes (ie. single sample MutationSeq) ● User contributed views and additional default views

  17. Summary Over the past 3 months: ● Redesigned interface to support integration of complex views ● Added support to easily add new views ● Realtime search and filtering through ElasticSearch ● Integrated and improved views/visualizations ● Used denormalized data to support linking between any number of views http://cbioportal.mo.bccrc.ca:8000/

  18. Acknowledgements Sohrab Shah Cydney Nielsen Development Team Daniel Machev Kelsey Hamer Ali Bashashati Kevin Wagner Shah Lab

Recommend


More recommend