visualization in biology
play

Visualization In Biology Alexander Lex CS 171 Guest Lecture, - PowerPoint PPT Presentation

Visualization In Biology Alexander Lex CS 171 Guest Lecture, 18.04.2013 WHA HAT T DO O I M I MEAN: N: VIS ISUALI ALIZA ZATION TION IN IN BI BIOL OLOG OGY? Y? 2 Visualizing the Flight of Bats? [Bergou 2011] 3 Visualizing Bird


  1. Visualization In Biology Alexander Lex CS 171 Guest Lecture, 18.04.2013

  2. WHA HAT T DO O I M I MEAN: N: VIS ISUALI ALIZA ZATION TION IN IN BI BIOL OLOG OGY? Y? 2

  3. Visualizing the Flight of Bats? [Bergou 2011] 3

  4. Visualizing Bird Populations? [Ferreira 2011] 4

  5. Visualizing Fish Swarms? [Boosherian 2012] 5

  6. Visualizing CT/MRI Data? [Bruckner 2007] 6

  7. NO NO! ! IN N THI HIS LE LECTUR TURE: E: MOL OLECUL CULAR AR BIOLOG OLOGY Y (M (MB) 7

  8. Why is MB important? Causes of Death in the USA 2011 Heart disease Cancer Chronic lower… Stroke Accidents Alzheimer's disease Diabetes Kidney-Related Influenza and Pneumonia Suicide 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 [Data from CDC Death and Mortality Repot 2011] 8

  9. Why is MB important? Causes of Death in the USA 2011 Heart disease Cancer Chronic lower… Stroke Accidents Alzheimer's disease Diabetes Kidney-Related Influenza and Pneumonia Suicide 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 [Data from CDC Death and Mortality Repot 2011] 9

  10. Why is MB important? Understanding Fundamentals in Biology Disease Prevention Targeted Diagnosis (BioMarkers) Personalized Medicine Drug Development Targeted Modification of Organisms 10

  11. Why is Vis for MB important? Biology is experiencing a revolution! Transformation from a wet-lab/experimental to computational science Challenge in MB is shifting from Data Acquisition to Data Processing & Analysis 11

  12. Why is Vis for MB important? 12

  13. What does this mean? We can now do very large experiments 13

  14. Why is the Analysis Hard? 20,000 protein coding genes (1.5% of the genome) 3 billion basepairs Gene -> Protein -> Function Each of these steps is influenced by many processes! Very complex interplay of functional aspects. 14

  15. Major Areas for Vis in MB Genome Structure Genome Activity - Omics Data Biological Networks Macromolecular Structures Phylogenetics 15

  16. Genome Structure What is the sequence of bases in a genome? Common “Defects” Chromosomal alterations Scale Copy-number variation Mutations SNPs How do these influence the phenotype? 16

  17. Genome Structure Vis “Track - based” Visualization 17

  18. Circular Layouts [Meyer 2009] 18 [Krzywinski 2009]

  19. Genome Activity Which genes are active? How active are they? Protein Expression Gene Expression Epigenetics: miRNA Expression methylation What is the function of a gene? 19

  20. Heat Maps [Eisen 1999] 20

  21. New Approaches! 21 [Meyer 2010]

  22. 22

  23. Biological Networks How do proteins and other (bio)chemical products interact? Protein-Protein interaction Pathways What are the processes in a cell? 23

  24. Protein Interaction Networks [Cytoscape] 24

  25. Pathways [Kegg] 25

  26. Pathways [Kegg] 26

  27. Pathways – Free Layouts [Barsky 2008] 27

  28. CASE SE STUD UDIES IES http://caleydo.org 28

  29. What is Caleydo? Software for visualizing biomolecular data tabular data numerical & categorical e.g., mRNA, microRNA, copy number variation, methylation, mutation status, etc. clinical data pathways KEGG, WikiPathways 29

  30. Caleydo Core Features Multi-Dataset Analysis. Want to see…. …relationships between multiple datasets? …relationships between tabular and graph data? 30

  31. What is Caleydo? Software for doing research in visualization developed in academic setting platform for trying out radically new visualization ideas Quest for compromise between academic prototyping and ready-to-use software Marc Streit & Alexander Lex 31

  32. Who is Caleydo? Marc Streit Johannes Kepler University Linz, AT Alexander Lex Harvard University, Cambridge, USA Christian Partl Graz University of Technology, AT Samuel Gratzl Johannes Kepler University Linz, AT Nils Gehlenborg Harvard Medical School, Boston, USA Dieter Schmalstieg Graz University of Technology, AT Hanspeter Pfister Harvard University, Cambridge, USA 32

  33. Case Study CANC NCER ER SUB UBTYPE TYPE VISU SUALIZA ALIZATION TION 33

  34. Motivation Cancer types are not homogeneous They are divided into Subtypes different histology different molecular alterations Subtypes have serious implications different treatment for subtypes prognosis varies between subtypes 34

  35. Cancer Subtype Analysis Done using many different types of data , for large numbers of patients. 35

  36. Large-scale project to catalogue genetic mutations responsible for cancer 20 tumor types 500 patient samples each Extensive molecular profiling for each patient 36

  37. TCGA Data microRNA expression clinical mRNA parameters expression methylation mutation pathways levels status copy number status 37

  38. Subtype Identification Patients 38

  39. Our goal is to support tu tumo mor r subtyp btype e ch chara racter cteriz izatio ation through integrative vis isual ual analysis ysis of of ca cance cer r genomi omics cs data ta sets ts. 39

  40. Data-View Integrator Challenge 1 Manage complex setup of multiple datasets , multiple stratifications & multiple views Challenge 2 Visualize complex interdependencies between multiple, heterogeneous, large datasets StratomeX 40

  41. Subtype Identification Process Step 1: Determine candidate subtypes Step 2: Find supporting evidence 41

  42. T abular Data Stratification Patients Candidate Subtypes Genes, Proteins, etc. 42

  43. Stratification of a Single Dataset Cluster A1 Cluster A2 Cluster A3 43

  44. Stratification Subtypes are identified by stratifying datasets, e.g., based on an expression pattern a mutation status a copy number alteration a combination of these 44

  45. Subtype Identification Process Step 1: Determine candidate subtypes Step 2: Find supporting evidence 45

  46. T asks T1 Evaluate whether stratifications support each other T2 Review effect of stratifications on clinical outcomes on pathways T3 Show expression patterns in subtypes 46

  47. Stratification of Multiple Datasets B1 Cluster A1 Cluster A2 B2 T1 Evaluate whether stratifications support each other Cluster A3 Tabular Categorical, e.g., mRNA e.g., mutation status 47

  48. Example: Titanic Dataset Multi-dimensional dataset Age Name Gender Survival status Class 1st class, 2nd class, 3rd class and crew How many male crew members survived ? http://lib.stat.cmu.edu/S/Harrell/data/descriptions/titanic.html 48

  49. Mosaic Plot Matrix [Friendly 1999] How many male crew members survived ? 49

  50. Parallel Sets [Kosara 2006] How many male crew members survived ? 50

  51. Stratification of Multiple Datasets B1 Cluster A1 Cluster A2 B2 T1 Evaluate whether stratifications support each other Cluster A3 Tabular Categorical, e.g., mRNA e.g., mutation status 51

  52. Stratification of Multiple Datasets B1 Cluster A1 Dep. C1 Dep. C2 Cluster A2 B2 T2 Review effect of stratification Cluster A3 Tabular Categorical, Dependent Data, e.g., mRNA e.g., mutation status e.g. clinical data 52

  53. Columns = Genes Band = Rows = Patients Subset of Patients 53

  54. Patients Patients stratified by stratified by Copy Number Clustering 54

  55. Cate- Depen- Table gorical dent T3 Show expression patterns in subtypes 55

  56. Survival EGFR Copy Number Status mRNA Levels Glioma Pathway Survival 56

  57. Live-Demo! http://stratomex.caleydo.org 57

  58. Case Study PATHW HWAY Y & & EXPERIM ERIMENT ENTAL AL DATA 58

  59. Experimental Data and Pathways Pathways represent consensus knowledge for a healthy organism or specific disease Cannot account for variation found in real-world data Branches can be (in)activated due to mutation, changed gene expression, modulation due to drug treatment, etc. 59

  60. Why use Visualization? Efficient communication of information B A C A -3.4 B 2.8 C 3.1 D D -3 E 0.5 E F 0.3 F 60

  61. Experimental Data and Pathways [KEGG] [Lindroos2002] 61

  62. REQU QUIR IREMEN EMENTS TS ANALYS YSIS IS 62

  63. What to Consider when Visualizing Experimental Data and Pathways Five Requirements Ideal visualization technique addresses all Talking about 3 today 63

  64. R I: Data Scale Large number of experiments Large datasets have more than 500 experiments Multiple groups/conditions 64

  65. R II: Data Heterogeneity Different types of data, e.g., mRNA expression numerical mutation status categorical copy number variation ordered categorical metabolite concentration numerical Require different visualization techniques 65

  66. R V: Supporting Multiple T asks B Two central tasks: A C Explore topology of pathway D Explore the attributes of the nodes E (experimental data) F Need to support both! 66

  67. VISU SUALIZA ALIZATION TION TECHNI HNIQUES QUES Alexander Lex | Harvard University 67

  68. Visualization Approaches [Lindroos 2002] On-Node Mapping Separate Linked Views Small Multiples [Meyer 2010] [Junker 2006] Layout Adaption Linearization Path-Extraction Alexander Lex | Harvard University 68

Recommend


More recommend