Re-inserting human interaction ! into cancer genome interpretation ! CYDNEY NIELSEN UNIVERSITY OF BRITISH COLUMBIA BRITISH COLUMBIA CANCER AGENCY
Outline 1 Visualization and its role in scientific discovery ! 2 Interactive cancer genomics visualization: why now? ! 3 Building a cancer genomics visualization platform ! • Flexible integration of views ! • Dynamic linking between views ! • Scalable to large data sets ! 4 Summary ! ! ! !
1 Visualization and its role in scientific discovery
Discovery loop QUESTIONS ! ? hypothesis experiments ! generation ! ! ...01100110... INSIGHTS ! DATA ! interpretation !
Discovery loop QUESTIONS ! ? PUBLICATIONS ! communication ! experiments ! ! ...01100110... INSIGHTS ! DATA ! interpretation !
Discovery loop QUESTIONS ! ? PUBLICATIONS ! communication ! experiments ! ! ...01100110... INSIGHTS ! DATA ! interpretation ! computer automation + human expert !
Intelligence Amplifying System > Artificial Intelligence System ! ! That is, a machine and a mind can beat a mind-imitating machine working by itself. ! - Frederick Brooks
Why visualization? Visualization ! • Leverages our ability to visually recognize patterns and enhances our ability to reason about data ! • Can reveal a level of detail that may be missed in summary statistics alone ! a I II III IV x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 b Anscombe’s quartet !
Why visualization? ? Visualization ! • Is well suited to questions where the solution is too ill-defined to be automatically computed ! ! ! ! INSIGHTS ! DATA ! ...01100110... interpretation !
Why visualization? Visualization ! • Can be further enhanced with interactivity , which is key to dynamic data exploration ! ! ! Example: ! ! Visual Information-Seeking Mantra ! ! Overview first, zoom and filter, then details-on-demand. ! ! - Shneiderman 1996 ! www.apple.com
Why visualization? Visualization ! • Reduces the computational barrier posed by many data analysis workflows ! ! !
2 Interactive cancer genomics visualization: why now?
Analogy: Human genome assembly Computer automation To reconstruct the human genome sequence from raw sequencing data ! Human expert To finish the genome: close gaps, correct mis-assemblies, improve error probabilities of the consensus bases ! Consed | David Gordon and Phil Green !
Analogy: Human genome assembly Computer automation Human expert Consed | David Gordon and Phil Green !
Analogy: Human genome assembly Computer automation Human expert • Some manual tasks become automated once they are better characterized (e.g. AutoFinish) ! • Computational analyses can be interactively focused by the user (e.g. local re-assembly) ! Consed | David Gordon and Phil Green !
Cancer genomics data interpretation Computer automation Mutations ! Copy number ! To predict diverse features that differ between tumor and matched-normal sample pairs ! A > G A > G ! deletion deletion ! Human expert To integrate and interpret these features together with relevant patient Rearrangements ! Gene expression ! metadata ! AAAAA! AAAAA! AAAAA! AAAAA! AAAAA! AAAAA! translocation translocation !
Cancer genomics data interpretation Computer automation Mutations ! Copy number ! A > G A > G ! deletion deletion ! Human expert Rearrangements ! Gene expression ! Need$interac+ve$visualiza+on$tools$to$ facilitate$the$human$component$and$ complement$the$computa+onal$one$ AAAAA! AAAAA! AAAAA! AAAAA! AAAAA! AAAAA! translocation translocation !
Genomics visualization Schroeder et al . Genome Medicine 2013, 5 :9 http://genomemedicine.com/content/5/1/9 REVIEW Visualizing multidimensional cancer genomics data Michael P Schroeder 1 , Abel Gonzalez-Perez 1 and Nuria Lopez-Bigas* 1,2 Matrix heatmaps Genomic coordinates Clinical data Chromosomal coordinates Omics data Clinical data Genes Omics data Samples Networks Interactions Genes Omics data Clinical data
Genomics visualization Schroeder et al . Genome Medicine 2013, 5 :9 http://genomemedicine.com/content/5/1/9 REVIEW Visualizing multidimensional cancer genomics data Michael P Schroeder 1 , Abel Gonzalez-Perez 1 and Nuria Lopez-Bigas* 1,2
3 Building a cancer genomics visualization platform
Flexible integration of views
Integrate multiple data types into one view Mutations | MutationSeq ! Ding et al. , Bioinformatics, 2012 ! Copy Number | Titan ! Ha et al. , Genome Research, 2014 ! Example analysis: Examine a mutation in its copy number context ! ! muta$on' dele$on'
Integrate multiple data types into one view Mutations | MutationSeq ! Ding et al. , Bioinformatics, 2012 ! Copy Number | Titan ! Ha et al. , Genome Research, 2014 ! Example analysis: Examine a mutation in its copy number context ! ! mutations ! copy number !
Compare data filters on a single data set Example analysis: ! Examine impact of MutationSeq probability threshold on coverage versus allele ratio distribution ! ! MutationSeq predictions !
Explore views of different data types Example analysis: ! Examine both the mutations and copy number alterations for a given sample ! ! MutationSeq predictions ! Titan copy number predictions !
Components View ! v! visual representation ! Region Filter ! on genomic range ! Data Filter ! on data parameters ! Data ! d! sample(s) + data type !
Integrate multiple data types into one view mutations ! copy number ! v! d! d!
Compare data filters on a single data set v! v! d! MutationSeq predictions !
Explore views of different data types v! v! MutationSeq predictions ! d! d! Titan copy number predictions !
Interface
Create Select a predefined structure !
Create Add to an existing structure !
Define Data Sample(s) ! Query by project name / tumour type / sample id ! ! Single data type ! e.g. mutations, copy number, etc. !
Filter Data Data filters depend on previously selected data type !
Filter Regions Limit the view to genes or regions of interest !
Select a View View types depend on previously selected data type !
Adjust View
Inspect/Modify
Dynamic linking between views
Dynamically link views of different data types MutationSeq predictions ! v! v! d! d! Titan copy number predictions !
Dynamically link views of different data types v! v! d! d!
Dynamically link views of different data types v! v! d! d!
Scalability
Research on big data visualization must address two major challenges: ! perceptual and interactive scalability ! Zhicheng Liu, Biye Jiang, Je ff rey Heer inMens, EuroVis 2013
Interactive scalability How to enable dynamic querying and rendering of millions of data points in real time? !
Search • Optimized for text search across documents ! • All fields are indexed for fast retrieval (bag-of-terms approach) ! • Query performance is a function of the number of query matches not the total data set size ! • Scales well as the data set size grows ! • Appropriate for load-once-read-many workflows !
Elasticsearch • Chose for ease of use (built on top of Apache Lucene) ! • Benefits include: ! o Built-in support for distributed data (manages shards across nodes) ! o Extensive caching ! o Sophisticated query language (DSL) !
Storing data Documents (records) ' mutation ! CNV ! Fields ' sample id: SA091 ! sample id: SA091 ! chrom: 1 ! chrom: 1 ! position: 104,589 ! start: 103,062 ! ref_allele: A ! end: 109,114 ! alt_allele: T ! state: GAIN ! probability: 0.91 ! ! !
Storing data Index ' mutation ! CNV ! CNV ! sample id: SA091 ! sample id: SA091 ! mutation ! CNV ! chrom: 1 ! chrom: 1 ! sample id: SA091 ! sample id: SA091 ! mutation ! position: 104,589 ! start: 103,062 ! sample id: SA091 ! chrom: 1 ! CNV ! chrom: 1 ! ref_allele: A ! end: 109,114 ! CNV ! mutation ! chrom: 1 ! start: 103,062 ! sample id: SA091 ! position: 104,589 ! sample id: SA091 ! alt_allele: T ! state: GAIN ! CNV ! mutation ! start: 103,062 ! end: 109,114 ! sample id: SA091 ! chrom: 1 ! sample id: SA091 ! mutation ! ref_allele: A ! chrom: 1 ! probability: 0.91 ! ! end: 109,114 ! state: GAIN ! chrom: 1 ! chrom: 1 ! position: 104,589 ! sample id: SA091 ! sample id: SA091 ! alt_allele: T ! CNV ! start: 103,062 ! sample id: SA091 ! ! mutation ! state: GAIN ! ! ref_allele: A ! start: 103,062 ! position: 104,589 ! chrom: 5 ! chrom: 1 ! probability: 0.91 ! end: 109,114 ! chrom: 1 ! sample id: SA091 ! ! end: 109,114 ! alt_allele: T ! ref_allele: A ! start: 2,062 ! sample id: SA091 ! position: 104,589 ! ! state: GAIN ! position: 104,589 ! chrom: 2 ! state: GAIN ! probability: 0.91 ! alt_allele: T ! end: 9,199 ! chrom: 2 ! ref_allele: A ! ! ref_allele: A ! start: 69,064 ! ! ! probability: 0.91 ! state: GAIN ! position: 19,586 ! alt_allele: T ! alt_allele: T ! end: 89,119 ! ! ref_allele: G ! ! probability: 0.95 ! probability: 0.91 ! state: DEL ! alt_allele: G ! !
Recommend
More recommend