Visualization In Biology Alexander Lex CS 171 Guest Lecture, 18.04.2013
WHA HAT T DO O I M I MEAN: N: VIS ISUALI ALIZA ZATION TION IN IN BI BIOL OLOG OGY? Y? 2
Visualizing the Flight of Bats? [Bergou 2011] 3
Visualizing Bird Populations? [Ferreira 2011] 4
Visualizing Fish Swarms? [Boosherian 2012] 5
Visualizing CT/MRI Data? [Bruckner 2007] 6
NO NO! ! IN N THI HIS LE LECTUR TURE: E: MOL OLECUL CULAR AR BIOLOG OLOGY Y (M (MB) 7
Why is MB important? Causes of Death in the USA 2011 Heart disease Cancer Chronic lower… Stroke Accidents Alzheimer's disease Diabetes Kidney-Related Influenza and Pneumonia Suicide 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 [Data from CDC Death and Mortality Repot 2011] 8
Why is MB important? Causes of Death in the USA 2011 Heart disease Cancer Chronic lower… Stroke Accidents Alzheimer's disease Diabetes Kidney-Related Influenza and Pneumonia Suicide 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 [Data from CDC Death and Mortality Repot 2011] 9
Why is MB important? Understanding Fundamentals in Biology Disease Prevention Targeted Diagnosis (BioMarkers) Personalized Medicine Drug Development Targeted Modification of Organisms 10
Why is Vis for MB important? Biology is experiencing a revolution! Transformation from a wet-lab/experimental to computational science Challenge in MB is shifting from Data Acquisition to Data Processing & Analysis 11
Why is Vis for MB important? 12
What does this mean? We can now do very large experiments 13
Why is the Analysis Hard? 20,000 protein coding genes (1.5% of the genome) 3 billion basepairs Gene -> Protein -> Function Each of these steps is influenced by many processes! Very complex interplay of functional aspects. 14
Major Areas for Vis in MB Genome Structure Genome Activity - Omics Data Biological Networks Macromolecular Structures Phylogenetics 15
Genome Structure What is the sequence of bases in a genome? Common “Defects” Chromosomal alterations Scale Copy-number variation Mutations SNPs How do these influence the phenotype? 16
Genome Structure Vis “Track - based” Visualization 17
Circular Layouts [Meyer 2009] 18 [Krzywinski 2009]
Genome Activity Which genes are active? How active are they? Protein Expression Gene Expression Epigenetics: miRNA Expression methylation What is the function of a gene? 19
Heat Maps [Eisen 1999] 20
New Approaches! 21 [Meyer 2010]
22
Biological Networks How do proteins and other (bio)chemical products interact? Protein-Protein interaction Pathways What are the processes in a cell? 23
Protein Interaction Networks [Cytoscape] 24
Pathways [Kegg] 25
Pathways [Kegg] 26
Pathways – Free Layouts [Barsky 2008] 27
CASE SE STUD UDIES IES http://caleydo.org 28
What is Caleydo? Software for visualizing biomolecular data tabular data numerical & categorical e.g., mRNA, microRNA, copy number variation, methylation, mutation status, etc. clinical data pathways KEGG, WikiPathways 29
Caleydo Core Features Multi-Dataset Analysis. Want to see…. …relationships between multiple datasets? …relationships between tabular and graph data? 30
What is Caleydo? Software for doing research in visualization developed in academic setting platform for trying out radically new visualization ideas Quest for compromise between academic prototyping and ready-to-use software Marc Streit & Alexander Lex 31
Who is Caleydo? Marc Streit Johannes Kepler University Linz, AT Alexander Lex Harvard University, Cambridge, USA Christian Partl Graz University of Technology, AT Samuel Gratzl Johannes Kepler University Linz, AT Nils Gehlenborg Harvard Medical School, Boston, USA Dieter Schmalstieg Graz University of Technology, AT Hanspeter Pfister Harvard University, Cambridge, USA 32
Case Study CANC NCER ER SUB UBTYPE TYPE VISU SUALIZA ALIZATION TION 33
Motivation Cancer types are not homogeneous They are divided into Subtypes different histology different molecular alterations Subtypes have serious implications different treatment for subtypes prognosis varies between subtypes 34
Cancer Subtype Analysis Done using many different types of data , for large numbers of patients. 35
Large-scale project to catalogue genetic mutations responsible for cancer 20 tumor types 500 patient samples each Extensive molecular profiling for each patient 36
TCGA Data microRNA expression clinical mRNA parameters expression methylation mutation pathways levels status copy number status 37
Subtype Identification Patients 38
Our goal is to support tu tumo mor r subtyp btype e ch chara racter cteriz izatio ation through integrative vis isual ual analysis ysis of of ca cance cer r genomi omics cs data ta sets ts. 39
Data-View Integrator Challenge 1 Manage complex setup of multiple datasets , multiple stratifications & multiple views Challenge 2 Visualize complex interdependencies between multiple, heterogeneous, large datasets StratomeX 40
Subtype Identification Process Step 1: Determine candidate subtypes Step 2: Find supporting evidence 41
T abular Data Stratification Patients Candidate Subtypes Genes, Proteins, etc. 42
Stratification of a Single Dataset Cluster A1 Cluster A2 Cluster A3 43
Stratification Subtypes are identified by stratifying datasets, e.g., based on an expression pattern a mutation status a copy number alteration a combination of these 44
Subtype Identification Process Step 1: Determine candidate subtypes Step 2: Find supporting evidence 45
T asks T1 Evaluate whether stratifications support each other T2 Review effect of stratifications on clinical outcomes on pathways T3 Show expression patterns in subtypes 46
Stratification of Multiple Datasets B1 Cluster A1 Cluster A2 B2 T1 Evaluate whether stratifications support each other Cluster A3 Tabular Categorical, e.g., mRNA e.g., mutation status 47
Example: Titanic Dataset Multi-dimensional dataset Age Name Gender Survival status Class 1st class, 2nd class, 3rd class and crew How many male crew members survived ? http://lib.stat.cmu.edu/S/Harrell/data/descriptions/titanic.html 48
Mosaic Plot Matrix [Friendly 1999] How many male crew members survived ? 49
Parallel Sets [Kosara 2006] How many male crew members survived ? 50
Stratification of Multiple Datasets B1 Cluster A1 Cluster A2 B2 T1 Evaluate whether stratifications support each other Cluster A3 Tabular Categorical, e.g., mRNA e.g., mutation status 51
Stratification of Multiple Datasets B1 Cluster A1 Dep. C1 Dep. C2 Cluster A2 B2 T2 Review effect of stratification Cluster A3 Tabular Categorical, Dependent Data, e.g., mRNA e.g., mutation status e.g. clinical data 52
Columns = Genes Band = Rows = Patients Subset of Patients 53
Patients Patients stratified by stratified by Copy Number Clustering 54
Cate- Depen- Table gorical dent T3 Show expression patterns in subtypes 55
Survival EGFR Copy Number Status mRNA Levels Glioma Pathway Survival 56
Live-Demo! http://stratomex.caleydo.org 57
Case Study PATHW HWAY Y & & EXPERIM ERIMENT ENTAL AL DATA 58
Experimental Data and Pathways Pathways represent consensus knowledge for a healthy organism or specific disease Cannot account for variation found in real-world data Branches can be (in)activated due to mutation, changed gene expression, modulation due to drug treatment, etc. 59
Why use Visualization? Efficient communication of information B A C A -3.4 B 2.8 C 3.1 D D -3 E 0.5 E F 0.3 F 60
Experimental Data and Pathways [KEGG] [Lindroos2002] 61
REQU QUIR IREMEN EMENTS TS ANALYS YSIS IS 62
What to Consider when Visualizing Experimental Data and Pathways Five Requirements Ideal visualization technique addresses all Talking about 3 today 63
R I: Data Scale Large number of experiments Large datasets have more than 500 experiments Multiple groups/conditions 64
R II: Data Heterogeneity Different types of data, e.g., mRNA expression numerical mutation status categorical copy number variation ordered categorical metabolite concentration numerical Require different visualization techniques 65
R V: Supporting Multiple T asks B Two central tasks: A C Explore topology of pathway D Explore the attributes of the nodes E (experimental data) F Need to support both! 66
VISU SUALIZA ALIZATION TION TECHNI HNIQUES QUES Alexander Lex | Harvard University 67
Visualization Approaches [Lindroos 2002] On-Node Mapping Separate Linked Views Small Multiples [Meyer 2010] [Junker 2006] Layout Adaption Linearization Path-Extraction Alexander Lex | Harvard University 68
Recommend
More recommend