Interactive Data Visualization in the Wild ! Challenges of Big Data in Cancer Genomics ! CYDNEY NIELSEN UNIVERSITY OF BRITISH COLUMBIA BRITISH COLUMBIA CANCER AGENCY
Outline 1 Visualization and its role in scientific discovery ! 2 Introduction to cancer genomics ! 3 Cancer genomics visualization – building a scalable platform ! 4 Summary ! ! ! !
1 Visualization and its role in scientific discovery
Discovery loop QUESTIONS ! hypothesis experiments ! generation ! INSIGHTS ! DATA ! interpretation !
Discovery loop QUESTIONS ! PUBLICATIONS ! experiments ! communication ! INSIGHTS ! DATA ! interpretation !
Discovery loop QUESTIONS ! PUBLICATIONS ! experiments ! communication ! INSIGHTS ! DATA ! interpretation ! computer automation + human expert !
Intelligence Amplifying System > Artificial Intelligence System ! ! That is, a machine and a mind can beat a mind-imitating machine working by itself. ! - Frederick Brooks
Why visualization? Visualization ! • Leverages our ability to visually recognize patterns and enhances our ability to reason about data ! • Can reveal a level of detail that may be missed in summary statistics alone ! a I II III IV x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 b Anscombe’s quartet ! Figure 1 a
Why visualization? Visualization ! • Is well suited to questions where the solution is too ill-defined to be automatically computed ! ! ! INSIGHTS ! DATA ! interpretation !
Why visualization? Visualization ! • Can be further enhanced with interactivity , which is key to dynamic data exploration ! ! ! Example: ! ! Visual Information-Seeking Mantra ! ! Overview first, zoom and filter, then details-on-demand. ! ! - Shneiderman 1996 ! www.apple.com
Why visualization? Visualization ! • Reduces the computational barrier posed by many data analysis workflows ! ! !
2 Cancer Genomics
Human genome Image from UCSF School of Medicine Office of Educational Technology
Cancer – disease of the genome Li Ding et al . Nature 2012
DNA Sequencing AGCGCAGATACAGACAGGTGAAACAGTACAG ! TGACAACAGTACCAAGTCAGAGTCCACATAG ! TAGAGGAGAGGCCAACATATAGACAACAGTT ! TGACAACAGTACCACAGAGTACATAGAGGAG ! AGCGCAGATACAGACAGGTGACAACAGAGAG ! Illumina HiSeq Input Output DNA prepared from a Millions of sequencing reads population of cells from a tissue sample
Detecting genomic alternations from sequence reads GATGACAACAGAGAGGTTACAC ! AGATGACAACAGAGAGGTTACA ! CAGATGACAACAGAGAGGTTAC ! GACAGGTGACAACAGAGAGGTT ! AGACAGATGACAACAGAGAGGT ! ATACAGACAGGTGACAACAGAG ! AGATACAGACAGGTGACAACAG ! GCGCAGATACAGACAGATGACA ! reference TAGCGCAGATACAGACAGGTGACAACAGAGAGGTTACACCAG !
Detecting genomic alternations from sequence Mutation ! reads G A TGACAACAGAGAGGTTACAC ! AG A TGACAACAGAGAGGTTACA ! CAG A TGACAACAGAGAGGTTAC ! GACAG G TGACAACAGAGAGGTT ! AGACAG A TGACAACAGAGAGGT ! ATACAGACAG G TGACAACAGAG ! AGATACAGACAG G TGACAACAG ! GCGCAGATACAGACAG A TGACA ! reference TAGCGCAGATACAGACAG G TGACAACAGAGAGGTTACACCAG !
Detecting genomic alternations from sequence Mutation ! reads G A TGACAACAGAGAGGTTACAC ! AG A TGACAACAGAGAGGTTACA ! CAG A TGACAACAGAGAGGTTAC ! GACAG G TGACAACAGAGAGGTT ! coverage ! AGACAG A TGACAACAGAGAGGT ! G A G A ATACAGACAG G TGACAACAGAG ! AGATACAGACAG G TGACAACAG ! GCGCAGATACAGACAG G TGACA ! reference TAGCGCAGATACAGACAG G TGACAACAGAGAGGTTACACCAG ! allele ratio = 0.5 !
Genomic alterations Mutation ! Copy number ! Rearrangement ! G A G A deletion deletion translocation translocation
Revolution in DNA sequencing technologies
The promise of data Green E. et al. Nature. February 10, 2011
Cancer genomics data interpretation MutationSeq ! Ding et al. ! Bioinformatics 2012 ! Computer automation G A G A To predict diverse genomic alterations ! Titan ! Ha et al. ! Genome Research ! 2014 ! Human expert deletion deletion To integrate and interpret these alternations together with relevant patient metadata ! deStruct ! translocation translocation
Cancer genomics data interpretation MutationSeq ! Ding et al. ! Computer automation Bioinformatics 2012 ! G A G A Titan ! Ha et al. ! Genome Research ! 2014 ! Human expert deletion deletion Need$interac+ve$visualiza+on$tools$to$ facilitate$the$human$component$and$ complement$the$computa+onal$one$ deStruct ! translocation translocation
3 Cancer Genomics Visualization
Many tools for many tasks Schroeder et al . Genome Medicine 2013, 5 :9 http://genomemedicine.com/content/5/1/9 REVIEW Visualizing multidimensional cancer genomics data Michael P Schroeder 1 , Abel Gonzalez-Perez 1 and Nuria Lopez-Bigas* 1,2 Matrix heatmaps Genomic coordinates Clinical data Chromosomal coordinates Omics data Clinical data Genes Omics data Samples Networks Interactions Genes Omics data Clinical data
Many tools for many tasks Schroeder et al . Genome Medicine 2013, 5 :9 http://genomemedicine.com/content/5/1/9 REVIEW Visualizing multidimensional cancer genomics data Michael P Schroeder 1 , Abel Gonzalez-Perez 1 and Nuria Lopez-Bigas* 1,2
h#p://www.cbioportal.org!
Key Feature 1 Flexible integration of views
Integrate multiple data types into one view Example analysis: Examine a mutation in its copy number context ! ! muta$on' dele$on'
Integrate multiple data types into one view Example analysis: Examine a mutation in its copy number context ! ! mutations ! copy number !
Compare data filters on a single data set Example analysis: ! Examine impact of MutationSeq probability threshold on coverage versus allele ratio distribution ! ! MutationSeq predictions !
Explore views of different data types Example analysis: ! Examine both the mutations and copy number alterations for a given sample ! ! MutationSeq predictions ! Titan copy number predictions !
Components View ! v! visual representation ! Region Filter ! on genomic range ! Data Filter ! on data parameters ! Data ! d! sample(s) + data type !
Integrate multiple data types into one view mutations ! copy number ! v! d! d!
Compare data filters on a single data set v! v! d! MutationSeq predictions !
Explore views of different data types v! v! MutationSeq predictions ! d! d! Titan copy number predictions !
Interface web-application implemented using D3.js !
Create Select a predefined structure !
Create Add to an existing structure !
Define Data Sample(s) ! Query by project name / tumour type / sample id ! ! Single data type ! e.g. mutations, copy number, etc. !
Filter Data Data filters depend on previously selected data type !
Filter Regions Limit the view to genes or regions of interest !
Select a View View types depend on previously selected data type !
Adjust View
Inspect/Modify
Key Feature 2 Dynamic linking between views
Dynamically link views of different data types MutationSeq predictions ! v! v! d! d! Titan copy number predictions !
Dynamically link views of different data types v! v! d! d!
Dynamically link views of different data types muta$on' v! v! dele$on' d! d!
Key Feature 3 Scalability
Research on big data visualization must address two major challenges: ! perceptual and interactive scalability ! Zhicheng Liu, Biye Jiang, Jeffrey Heer inMens, EuroVis 2013
Interactive scalability How to enable dynamic querying and rendering of millions of data points in real time? !
Search • Optimized for text search across documents ! • All fields are indexed for fast retrieval (bag-of-terms approach) ! • Query performance is a function of the number of query matches not the total data set size ! • Scales well as the data set size grows ! • Appropriate for load-once-read-many workflows !
Elasticsearch • Chose for ease of use (built on top of Apache Lucene) ! • Benefits include: ! o Built-in support for distributed data (manages shards across nodes) ! o Extensive caching ! o Sophisticated query language (DSL) ! o REST API !
Storing data Relational Database ! Elasticsearch ! • Database ! • Index ! • Table ! • Type ! • Row ! • Document ! • Column ! • Field !
Storing data Documents (records) ' mutation ! CNV ! Fields ' sample id: SA091 ! sample id: SA091 ! chrom: 1 ! chrom: 1 ! position: 104,589 ! start: 103,062 ! ref_allele: A ! end: 109,114 ! alt_allele: T ! state: GAIN ! probability: 0.91 ! ! !
Recommend
More recommend