In The Beginning Data. Lots of it. eg. VCF, BAM files
In The Beginning Goal. Build a web-based interface on top of a fast backend to help navigate and explore the data esv
Origin: Prototype
Origin: Challenges Linking : All views should be interactive
Origin: Challenges Scalability : Creating, editing, and linking should be fast to drive data discovery
Origin: Challenges Interface : Exploring data should be natural, informative, and easy to follow
Structures View Visual Representation View Filter on genomic positions, genes Data Filter on data parameters (eg. threshold, experiment type) Data Underlying data source (ie. by sample ID, project)
Progress Major Highlights ● Redesigned interface and editor ● New query engine ● Improved views / visualizations to support linking and interaction ● Supertable ● Data denormalization contributions
Live Demo ESV Demonstration
Data Denormalization: Why? ● ElasticSearch is an extremely fast text-search engine - but it is schema-free ○ No set column names, no defined structure ● How do we find relations then?
Data Denormalization: Why? TITAN Dataset How do we know which mutations fall within which copy number alteration given a given genomic coordinate? Mutationseq Dataset
Data Denormalization: How? MutationSeq TITAN sample id: DG1155 sample id: DG1155 chrom: 01 chrom: 01 position: 104,589 start: 103,062 ref_allele: A end: 109,114 alt_allele: T state: GAIN probability: 0.91 ... ...
Data Denormalization: How? MutationSeq TITAN sample id: DG1155 sample id: DG1155 chrom: 01 chrom: 01 position: 104,589 start: 103,062 ref_allele: A end: 109,114 alt_allele: T state: GAIN probability: 0.91 events: {...} ...
Data Denormalization: How? ● Unlike Facebook or Twitter, our Mutationseq data is mainly static sample id: DG1155 ● Exploit ElasticSearch’s very fast chrom: 01 query term search position: 104,589 ref_allele: A ● Ask questions like: Find me all the alt_allele: T TITAN segments that overlap a probability: 0.91 particular MutationSeq event events: { chrom: 01 start: 103,062 end: 109,114 state: GAIN ... }
Data Denormalization: Result
To Infinity and Beyond ● Applications to other areas of research and/or industry in the future, as ESV was designed to be as general as possible ● Addition of new datasets/datatypes (ie. single sample MutationSeq) ● User contributed views and additional default views
Summary Over the past 3 months: ● Redesigned interface to support integration of complex views ● Added support to easily add new views ● Realtime search and filtering through ElasticSearch ● Integrated and improved views/visualizations ● Used denormalized data to support linking between any number of views http://cbioportal.mo.bccrc.ca:8000/
Acknowledgements Sohrab Shah Cydney Nielsen Development Team Daniel Machev Kelsey Hamer Ali Bashashati Kevin Wagner Shah Lab
Recommend
More recommend