2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For - PowerPoint PPT Presentation

2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For Representing Associated Statistics By Daniel B. Carr dcarr@gmu.edu George Mason University

Overview • Background • Encoding and self-similar coordinates • Examples • Rendering software – GLISTEN • Closing remarks

Background • Task – Visualize statistics indexed by a sequence of letters • Letter-Indexing – Nucleotides: AAGTAC – Amino Acids: KTLPLCVTL – Terminology: blocks of m letters called m-mers • Statistics: counts or likelihoods for – Short DNA sequence motifs for transcription factor binding: gene regulation – Peptide docking on immune system molecules

Graphical Design Goals • Provide an overview and selective focus • Use geometric structures to – Organize statistics – Reveal patterns – Provide cognitive accessibility • Incorporate scientific knowledge in layout choices – Enhance patterns and simplify comparisons

Common Practice - Tables • Published tables – a linear list – Sorted by values of a statistic – Indexing letter sequences shown as row labels – Only few items shown of thousands to millions

Common Practice - Graphics • 1-D histograms – some examples – Nucleotides: Distribution of promoters by distance upstream from the start codon – Amino acids: • Sequence alignment logo plots are one variant • Docking counts by position • Cell-colored matrices? – More commonly used for microarray data and correlation matrices

HLA-A2 Molecule Peptide Docking Counts By Amino Acid Given Position Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6 Pos 7 Pos 8 Pos 9 A C D E F G H I K L M N P Q R S T V W Y 50 50 150 250 50 50 50 50 50 50 50 150

Graphical Encoding Ideas: Use Points For M-Mers • Represent m-mers using coordinates – A point stands for an m-mer – A glyph at the point represents statistics for that m-mer. For example point color, size, shape • Challenge – The domain of all letter sequences is exponential in sequence length – Display space is limited

Self-Similar Coordinates • Self-similarity helps us keep oriented – Parallel coordinate plots are increasingly familiar • Coordinates from 3-D geometry – 4 Nucleotides => tetrahedron – 20 Amino acids • Icosahedron face centers • Familiar coordinates => hemisphere • Two kinds of self-similarity – At different scales => fractals – At the same scale => shells, surfaces

Self-Similarity At Different Scales: Nucleotide Example • Represent each 6-mer as a 3-D point – (4 nucleotides) 6 = 4096 points • Attractor: tetrahedron vertices – A=(1,1,1), C=(1,-1,-1), G=(-1,1,-1), T=(-1,-1,1) • Computation: – Hexamer position weights: 2^(5,4,3,2,1,0)/63 – ACGTTC -> (.555, .270, .206)

Application: Gene Regulation Studies • Cluster genes based on – Gene expression levels in different situations – Other criteria such as gene family • For each cluster look in gene regulation regions for recurrent nucleotide patterns – Over expressed m-mers: potential transcription factor docking sites • Show frequencies (or multinomial likelihoods)

Nucleotides Example Yeast Gene Regulation 29 Genes in a cluster Sliding hexamer window 300 letters upstream from – YBL072c open reading frames – YDL130w – 300 ATATGA – YDR025w – 299 TATGAG – … – 298 ATGAGT – YCL054w – 297 TGAGTA

Statistics • Number of genes with hexamer – TTTTTC 22 – GAAAAA 21 – TTTTTT 19 – AAAAAT 19 – TTTTCA 18 – ATTTTT 17 • Total number of appearances, etc.

Extensions • 2-D version (projected gasket) – 10mers => 1024 x 1024 pixel display • Wild card and dimer counts – TACC……GGAA • Include more scientific knowledge – Special representations for known transcription factors • More interactivity – Filtering for regions upstream – Mouseovers, etc.

Self-Similarity At Different Scales: Amino Acids Sequence Coordinates • Represent each 3-mer as a 3-D point – (20 amino acids) 3 = 8000 points • Attractor: icosahedron face centers – Let x1= .539, x2=.873, x3=1.412 – A=(x1,x3,0), C=(0,x1,x3), … Y=(-x3,0,-x1) • Computation Position weights: 3.8 (2,1,0) scaled to sum to 1. Letters HIT => (-1.26, -1.08, .180)

Graphical Encoding Ideas: Paths • Use paths connecting m-mer points to represent longer sequences – Path features, thickness and color can encode statistics indexed by the concatenated m-mers – Can reuse the m-mers keeping a common framework – 3 3-mers -> two segment path -> 9 mer • Challenges – Overplotting, path ambiguity, prime sequence lengths – Using translucent triangles for triples is poor, etc.

Letter x Position Coordinates And Paths • Merits – Few points and simple structure • 20 amino acids by 9 positions = 180 points • Challenges – Path overplotting =>filtering – Avoiding path interpretation ambiguity in higher dimensional tables => 3-D layouts

Self-Similarity At The Same Scale: Amino Acids Coordinates • Each point represents a letter and position pair – 9-mers: 20 letter x 9 positions = 180 points • Geometry: icosahedron face centers – Let x1= .539, x2=.873, x3=1.412 – A=(x1,x3,0), C=(0,x1,x3), … Y=(-x3,0,-x1) • Use scale factor for a given position – Scale factors for 9-mers: 2.2, 2.4, 2.6, …, 3.6 – A1 => 2.2*(x1,x3,0) C2=>2.4*(0,x1,x3) • Problem: overplotting of paths

Self-Similarity At The Same Scale: Amino Acids Example • Each point represents a letter and position pair – 9-mers: 20 letter x 9 positions = 180 points • Geometry: hemisphere – Amino acid: longitude, Position: latitude – Amino acid ordering • Group by chemical properties: hydrophobic, etc. • Order to minimize path length in given application – Include gaps for perceptual grouping • Path overplotting still a problem, need filtering

Peptide Docking Example • Immune system molecules combine with peptides to form a complex recognized by T-cell receptors – Problems: • Failure to dock foreign peptides • Docking with “self” peptides • Molecule specific databases of docking peptides – MHCPEP 1997, Brusic, Rudy, and Harrison – Human leukocyte antigen (HLA) A2, class 1 molecule • Small: about 500 peptides of 20 9 = ½ trillion possibilities • Mostly 9-mers (483) • Positions related to asymmetric docking groove

Peptide Docking Interests • Which amino acids appear in which position? • Characterize the space of • docking, not-docking, unknown • Prediction of unknowns • Focused questions • Is there a docking peptide in a key protein common to all 23 HIV strains?

Docking Statistics Number of the 483 peptides with the amino acid in position 2 M Q P S T F V A L G I K R H E D C W N Y 45 4 1 1 23 2 16 14 294 1 71 5 2 0 2 1 1 0 0 1 Cells from the collection of all 4-position tables: 126 tables of potentially 20 4 = 160000 cells each G4 F5 V6 F7: 35 L2 A7 A8 V9: 29 …

Graphics Software • GLISTEN – Geometric Letter-Indexed Statistical Table Encoding – Swap out coordinates at will with tables unchanged – NSF research: second generation version in progress • Available partial alternatives – CrystalVision ftp://www.galaxy.gmu.edu/pub/software/ – Ggobi www.ggobi.org/download.html

Hemisphere Plot Versus Parallel Coordinate Plots • PC plots are – Better for the many scientists preferring flatland – Straight forward to publish – Ambiguous when connecting non-adjacent axes • Hemisphere plots – 3-D curvature reduces line ambiguity and provides a general framework for tables involving non-adjacent positions – 3-D provides more neighbor options to group amino acids based on chemical properties: non-polar, etc.

Closing Remarks • Docking applications are still evolving – New procedures for inference and better databases • Graphics still need work – More scientific structure – Work on cognitive optimization • GLISTEN can address many other applications

Graphics Reference • Lee, et al. 2002, “The Next Frontier for Bio- an Cheminformatics Visualization,” IEEE Computer Graphics and Applications, Sept/Oct pp,. 6-11.

Relate Scientific References (1) Spellmen, et al. 1998. “Comprehensive Identification of Cell Cycle-regulated Gened of the Yeast Saccharomyces c ervisiae by Microarray Hybridization,” Molecular Biology of the Cell. Vol 9, pp. 3273-3297. Keles, van der Laan, and Eisen. 2002. “Identification of regulatory elements using a feature selection method.” Bioinformatics, Vol. 18. No 9. pp1167-1175.

Related Scientific References (2) • Segal Cummings and Hubbard. 2001. “Relating Amino Acid Sequences to Phenotypes: Analysis of Peptide-Binding Data,” Biometrics 57, pp. 632-643.

2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For - PowerPoint PPT Presentation

2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For Representing Associated Statistics By Daniel B. Carr dcarr@gmu.edu George Mason University Overview Background Encoding and self-similar coordinates Examples

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

MERS and Securitization in MERS and Securitization in Contested Foreclosure Litigation Overcoming

Chapter 1 Elementary Concepts Lines and Coordinates Device Coordinates Logical

Pre-Conference Workshop: Hands-On Approach to the MERS Employer Portal Presenters: Cara

graphics pipeline lecture 7 (lectures 1-6) clip coordinates - graphics pipeline (overview)

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

Graphics Pipeline (Slides courtesy of Tamar Shinar) Graphics Pipeline ff Transform ff

3D GRAPHICS design animate render Computer Graphics 3D animation movies Computer Graphics

Coordinates Josiah Manson and Scott Schaefer Texas A&M University Barycentric Coordinates

Crystallography revisited 1 Point coordinates z 111 c Point coordinates for unit cell center

view transformations: lecture 3 How do we map from world coordinates to camera/view/eye view

26. Spherical coordinates; applications to gravitation We have already seen that sometimes it is

Unfunded Accrued Liability About MERS We are a nonprofit organization, independent from the

Graphics Processing CS418 Computer Graphics John C. Hart Graphics Processing Graphics

Computer Graphics - Transformations - Hendrik Lensch Computer Graphics WS07/08

Window to Viewport CS418 Computer Graphics John C. Hart Graphics Processing Vertex Fragment

Dissertation Proposal: The Evolution of Robust Software Eric Schulte University of New Mexico

Bipartite Networks and their Application to the Study of the Railway Transport Systems Niloy

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

Mul$scale Molecular Systems Biology: Reconstruc$on and Model

Population Structure Population Structure Nonrandom Mating HWE assumes that mating is random in

Z ACZNIK 3 AUTHORS REVIEW OF T HE RESEARCH ACHIEVEMENTS AND PUBLICATIONS 1. Name and

Fiv ive In Innovatio ions that has changed Shrim imp cult lture SPF/Nucleus Breeding;

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics

2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For - PowerPoint PPT Presentation

2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For Representing Associated Statistics By Daniel B. Carr dcarr@gmu.edu George Mason University Overview Background Encoding and self-similar coordinates Examples

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

MERS and Securitization in MERS and Securitization in Contested Foreclosure Litigation Overcoming

Chapter 1 Elementary Concepts Lines and Coordinates Device Coordinates Logical

Pre-Conference Workshop: Hands-On Approach to the MERS Employer Portal Presenters: Cara

graphics pipeline lecture 7 (lectures 1-6) clip coordinates - graphics pipeline (overview)

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

Graphics Pipeline (Slides courtesy of Tamar Shinar) Graphics Pipeline ff Transform ff

3D GRAPHICS design animate render Computer Graphics 3D animation movies Computer Graphics

Coordinates Josiah Manson and Scott Schaefer Texas A&amp;M University Barycentric Coordinates

Crystallography revisited 1 Point coordinates z 111 c Point coordinates for unit cell center

view transformations: lecture 3 How do we map from world coordinates to camera/view/eye view

26. Spherical coordinates; applications to gravitation We have already seen that sometimes it is

Unfunded Accrued Liability About MERS We are a nonprofit organization, independent from the

Graphics Processing CS418 Computer Graphics John C. Hart Graphics Processing Graphics

Computer Graphics - Transformations - Hendrik Lensch Computer Graphics WS07/08

Window to Viewport CS418 Computer Graphics John C. Hart Graphics Processing Vertex Fragment

Dissertation Proposal: The Evolution of Robust Software Eric Schulte University of New Mexico

Bipartite Networks and their Application to the Study of the Railway Transport Systems Niloy

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

Mul$scale Molecular Systems Biology: Reconstruc$on and Model

Population Structure Population Structure Nonrandom Mating HWE assumes that mating is random in

Z ACZNIK 3 AUTHORS REVIEW OF T HE RESEARCH ACHIEVEMENTS AND PUBLICATIONS 1. Name and

Fiv ive In Innovatio ions that has changed Shrim imp cult lture SPF/Nucleus Breeding;

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics

Coordinates Josiah Manson and Scott Schaefer Texas A&M University Barycentric Coordinates