Genome Visualization with Circos INTRODUCTION TO CIRCOS MARTIN KRZYWINSKI Michael Smith Genome Sciences Center BC Cancer Research Center Vancouver, Canada EMBO PRACTICAL COURSE: BIOINFORMATICS AND COMPARATIVE GENOME ANALYSES Stazione Zoologica Anton Dohrn, Naples - Italy May 7–19, 2012 GENOME VISUALIZATION WITH CIRCOS v20120508
AVOID LINEAR LAYOUT COMPARISONS Thomson, N.R., et al., Comparative genome analysis of Salmonella Thomson, N.R., et al., Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways. Genome Res, 2008. 18(10): p. evolutionary and host adaptation pathways. Genome Res, 2008. 18(10): p. 1624-37. 1624-37. 2 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
circos appearances LITERATURE AND MEDIA 3 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
CIRCOS IN THE LITERATURE >100 citations, 5 book covers 4 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
CIRCOS IN THE LITERATURE NYT Science, 4 May 2012 5 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
VARIETY OF VISUALIZATIONS http://www.circos.ca/images/scientific_literature/ 6 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
PRIMARY LITERATURE Hillmer AM, Yao F , Inaki K et al. 2011 Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome research 21:665-675. 7 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
REVIEW LITERATURE Ledford H 2010 Big science: The cancer genome challenge. Nature 464:972-974. 8 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
POPULAR SCIENCE AQ Magazine, April 2011 (Simon Fraser University) 9 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
POPULAR CULTURE Wired, April 2010 10 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
URBAN PLANNING The town of Caceres, Spain, a UNESCO World Heritage Site, used Circos to illustrate the relationships between businesses in their urban planning strategy. 11 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
ADVERTISING . 12 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
what makes circos useful? CIRCULAR LAYOUT + FLEXIBLE IMPLEMENTATION 13 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
WHY IS CIRCOS USEFUL? SIMPLE + DEEP TIMELY + EFFECTIVE COMPATIBLE Large number of data Circos addresses the need Driven entirely by plain-text tracks, which can be to visualize differences in configuration files. stacked and layered. disease genomes and assess variation in genomic Data agnostic. Format of everything in the content across many figure can be dynamically samples. Simple format for data input. adjusted based on rules that react to data values. Dynamic rules provide a Highly automatable. way to adjust the format of Utility tools assist with figure elements based on Fits naturally into any data manipulating data files (e.g. data values. pipeline. binning links and ordering ideograms to optimize SVG output is designed for Extended longevity: layout). publication-quality performs only visualization, visualizations. not analysis. Perceptual color palettes and high quality fonts are built in. 14 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
EYE PREFERS CURVES Moving your eye across the curved path is faster and more comfortable. 15 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
VARIABLE RESOLUTION Bin size ranges from 50 Mb (inside) to 1 Mb (outside). Image shows the density of genes across the human genome. 16 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
ADJACENCY, CONTINUITY & FOCUS Linear layout of scale has disadvantages of changing focus (regions in the center of the image receive more attention), broken adjacency (neighbouring points on a linear scale are separated), broken continuity (data tracks are difficult to follow from one edge of the figure to another), and non-uniform data emphasis (center and edge of the axis are not perceived uniformly - the edge implies periphery, which may not apply. 17 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
TYPICAL CIRCOS IMAGE (A) histogram (B) ideograms (C) histogram (D) heat map (E) links (F) highlights (G) grid (H) ticks. Format of data in tracks A, C, D, E is adjusted by rules based on data values. 18 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
examples from literature 19 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
EXAMPLE FROM LITERATURE The most frequent complex rearrangements involving MLL and (A) AFF1/AF4. Localization of chromosomal breakpoints and UPN of individual patients are indicated. Colored lines indicate in-frame fusions (green), out-of-frame fusions (red), no partner gene present at the recombination site (blue). Meyer, C., E. Kowarz, et al. (2009). "New insights to the MLL recombinome of acute leukemias." Leukemia 23(8): 1490-1499. Figure by M Krzywinski. 20 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
EXAMPLE FROM LITERATURE Various types of data tracks can be stacked. Five instances of a compound track each represent copy number information from a different sample. Two histograms, a line plot and a scatter plot are used to form a compound track. Using links and highlights, attention is drawn to the progression of scale increase within chr17:53-63Mb. This region is magnified at 5x and smaller subregions are further magnified to 40x. 21 Krzywinski, M., J. Schein, et al. (2009). "Circos: an information aesthetic for comparative genomics." Genome Res 19(9): 1639-1645. GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
EXAMPLE FROM LITERATURE Data sets which do not sample the genome uniformly (A) can be effectively shown by using a connector track (B) to show the remapping onto an index scale (C). Shown in the figure are methylation values (A) for 7 tissues are summarized using stacked histograms (C), whose bins represent statistics for remapped methylation probe positions. Zimmer, C. (2008). Now: The Rest of the Genome. New York Times. Figure by M Krzywinski. 22 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
LINK GEOMETRY The same data set is shown in all panels. (A) each link represents one of a subset of 2,500 segmental duplications within the human genome. (B) rules are used to change link color and thickness. (C) rules are used to show only links to chrY. (D) in addition to rules in (C), other rules add a second layer of links from chr8. 23 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
EXAMPLE FROM LITERATURE Regions of similarity between human and dog genomes. (A) human genome. (B) human ideograms. (C) dog genome. (D) dog ideograms, coded by most similar human chromosome. (E,F) link bundles connect similar regions. (F1) rules are used to color bundles by size. (F2) bundles twist when similarity involves opposite strands. American Scientist, Sept-Oct 2007. Cover figure by M Krzywinski. 24 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
RULES CAN CHANGE DATA GLYPH COLOR AND SIZE The size and outline of each scatter plot glyph is influenced by the data value. The data value itself can be altered, as see in the two outermost collapsed scatter plots, where the value for each point has been set to 0 to display the glyphs at the same radius. 25 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
TRACK DEFINITION WITH TEMPLATES Each track is associated with several internal counters. The value of the By referencing the template multiple times, new tracks can be created counters are different for each track and can be used to drive track automatically, without having change the template. generation from a single template. 26 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
GLYPH TRACKS Three tracks showing sequence data. Each label corresponds to a base, colored by the identity of the base. In the first track, each base label is changed to “X” using rules. In the second track, a wingding symbol font is used, and the label is changed to “n”, which corresponds to a square glyph in this font. In the third track, the label is changed to “l”, which is a circle. 27 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
BUBBLE DENSITY TRACK A single gene density data file is used to populate four tracks. Individual density data points are categorized based on categoreis: cancer genes (red), OMIM genes (orange), and all others (green). Rules are used to show specific categories in a track and to change the label from the category name (e.g. cancer) to an “l”, which is a circle in the wingding font. 28 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
implementation CONTROL AND INTEGRATION 29 GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction
Recommend
More recommend