current challenges in genomic data visualization
play

CURRENT CHALLENGES IN GENOMIC DATA VISUALIZATION Cydney Nielsen BC - PowerPoint PPT Presentation

CURRENT CHALLENGES IN GENOMIC DATA VISUALIZATION Cydney Nielsen BC Cancer Agency Genome Sciences Centre Vancouver, Canada The Data Deluge ~$5,000 in 2001 ~10 in 2011 Sequencing Experiments De novo assembly Re-sequencing Enrichment


  1. CURRENT CHALLENGES IN GENOMIC DATA VISUALIZATION Cydney Nielsen BC Cancer Agency Genome Sciences Centre Vancouver, Canada

  2. The Data Deluge ~$5,000 in 2001 ~10¢ in 2011

  3. Sequencing Experiments De novo assembly Re-sequencing Enrichment CCAGACAAGACAGACACAGTA GGCATACAGACTTAGACATA AGCTTCAGATGGACAGATAA AGCTTCAGATGGACAGATAA AGCTTCAGATGGACAGATAA GGCATACAGACTTAGACATA CCAGACAAGACAGACACAGTA GGCATACAGACTTAGACATA CCAGACAAGACAGACACAGTA CCAGACAAGACAGACACAGTA CCAGACAAGACAGACACAGTA TACAAGACATAAGCAATACAGA TACAAGACATAAGCAATACAGA TACAAGACATAAGCAATACAGA CCAGACAAGACAGACACAGTA Reference Genome Reference Genome Genome Assembly

  4. Drew Sheneman, New Jersey - The Newark Star Ledger

  5. Challenge 1 Large number of samples for comparison “To systematically characterize the genomic changes in hundreds of tumors … and thousands of samples over the next five years” The Cancer Genome Atlas www.cancergenome.nih.gov

  6. Genome Browsers Stacked data tracks along a common genome x-axis Data samples Genome coordinate

  7. Home Genomes Blat Tables Gene Sorter PCR PDF/PS Session FAQ Help UCSC Cancer Genomics Heatmaps Glioblastoma Copy Number Abnormality, Agilent 244A array (n=200) Data samples r e Tumor vs normal d n e G Genome coordinate Heatmap provides a more condensed view Zhu et al ., Nature Methods, 2009 Recurrent deletion of all or part of chromosome 10, peak at PTEN locus

  8. Challenge 1 Large number of samples for comparison Consider what information is needed e.g. replace with biologically meaningful summary, such as significant change between samples

  9. Home Genomes Blat Tables Gene Sorter PCR PDF/PS Session FAQ Help UCSC Cancer Genomics Heatmaps Glioblastoma Copy Number Abnormality, Agilent 244A array (n=200) r e Tumor vs normal d n e G Example: Summary view (column averages) Zhu et al ., Nature Methods, 2009 Recurrent deletion of all or part of chromosome 10, peak at PTEN locus

  10. Challenge 2 Large number of data types

  11. Genomic rearrangements in cancer (complex representation) A Deletion-type Tail-to-tail inverted SNU-C1 (colorectal): Chr 15 Tandem dup-type Head-to-head inverted Non-inverted orientation 4 Copy 2 number 0 1 Allelic ratio 0 Inverted orientation 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Genomic location (Mb) Stephens et al. , Cell, 2011

  12. 17 mouse genomes (more compact representation) N N Z a O O D / / H SNPs S I h L 0 >100,000 D i t L J L SVs C B t P 5 J A 0 742 7 C / / J C B B 2 TEs L 3 A J 1 / B H 0 179 6 / 2 A J / N 9 Uncallable H L CAST/EiJ 1 1 J S B e 2 0 836 2 5 / J 14 9 9 / 13 2 A c 15 1 16 11 S A 18 0 P S J 19 1 7 K v 1 1 / X 2 9 E R J 8 / / 1 7 O S v / J B v 2 6 l r a I d 3 5 m H 4 4 J s d 3 5 6 2 WSB/EiJ 7 1 8 1 9 1 0 2 11 3 12 13 4 14 5 15 16 6 1 7 7 1 8 1 9 8 X 9 10 1 1 1 2 1 2 13 3 14 4 15 5 16 6 17 18 7 19 8 X 9 X 0 1 19 1 1 18 PWK/PhJ 2 1 1 7 13 16 4 15 1 15 1 4 16 13 17 12 18 19 11 10 X 9 8 1 2 7 3 6 4 5 SPRET/EiJ Still difficult to represent many data types b in a general tool Keane et al ., Nature, 2011

  13. Challenge 2 Large number of data types Compact, customized data encoding

  14. ABySS-Explorer Represents sequence - connectivity - strand - length - mapping on reference Interactively access - sequence coverage - scaffolding (a) reference human genome (b) inversion event in a human lymphoma genome Nielsen et al . Best Paper Award at InfoVis 2009

  15. Challenge 3 Genomic features are sparse

  16. Genome Browsers LOCAL VIEW Human chr1, 1 pt corresponds to 480 kb, which is larger than 98% of all human genes! - Martin Krzywinski

  17. Hilbert Curve GLOBAL VIEW a b expressed genes Chromosome 3L Cluster of small 5 ′ 3 ′ Open chromatin domain domains PcG 5 ′ 3 ′ Heterochromatin- like domain 5 ′ 3 ′ heterochromatin Pericentromeric 5 ′ 3 ′ Chromatin states: 1 2 3 4 5 6 7 8 9 Kharchenko et al ., Nature, 2011 Anders, Bioinformatics, 2009

  18. Challenge 3 Genomic features are sparse Need both overview and detail Functional axis (perhaps not full genome)

  19. Spark – a genomic data exploration tool 1. ¡Focus ¡on ¡regions ¡of ¡interest ¡(e.g. ¡transcrip8onal ¡start ¡sites) ¡ H3K4me3 H3K9Ac H3K4me1 H3K36me3 H3K27me3 H3K9me3 MeDIP MRE 2. ¡Extract ¡data ¡matrices ¡ 3. ¡Cluster ¡matrices ¡ ¡ 4. ¡Interac8ve ¡cluster ¡visualiza8on ¡ ¡ Nielsen et al . in preparation

  20. Challenge 4 No longer one genome but many

  21. Single nucleotide variation Ossowski et al . Genome Research, 2008

  22. Single nucleotide variation Integrative Genomics Viewer (IGV) Robinson et al . Nature Biotechnology, 2011

  23. Structural variation Bhutkar et al ., Genetics, 2008

  24. Challenge 4 No longer one genome but many Capture variation on a graph

  25. Sequence variation on a graph Comeau et al ., Mol. Biol. Evol., 2010 Users may require more time to learn how to interpret graph representations, but such graphs are likely to scale better and may prove more powerful for analysis

  26. Sequence variation on a graph Paten et al ., Genome Research, 2011

  27. Challenge 5 Human Computational Judgement Analysis

  28. Consed Genome Assembly and Finishing Tool David Gordon and Phil Green Good example of integrated visualization and computational analysis functionality

  29. Challenge 5 Need to integrate computation High interactivity, low memory overhead Avoid storing large data sets locally Popularity of web-based tools Evolving sequencing technologies

  30. Summary Large number of samples for comparison 1 Large number of data types 2 Genomic features are sparse 3 4 No longer one genome but many 5 Need to integrate computational analysis

Recommend


More recommend