Data - Don’t Just Visualize the Raw Data! Example when this advice is ignored Example Original (Raw) Data Derived Data T. Munzner (2014) – Visualization Design and Analysis XKCD
Tasks - How People Use the Data Geographic Overview of Prostate Cancer Individual Prostate Cancer Risk § Useful for epidemiologists and policy makers § Good for patients and doctors § Supports surveillance tasks § Supports treatment decision making tasks Source : (UT San Antonio) Source : Atlanta CDC
Tasks - How People Use the Data • Tasks can also change how the same data should be visualized • Example: representing US electoral collage results Standard Map Cartogram
Tasks - How People Use the Data • Tasks can also change how the same data should be visualized • Example: representing US electoral collage results Standard Map Snakey Diagram
Tasks - How People Use the Data • Tasks can also change how the same data should be visualized • Example: representing US electoral collage results
How can we identify Examples from tasks and data? my own research
My research : making a clinical report for tuberculosis • Mixed methods approach to gathering data and tasks Di Disco cove very De Design Implement Im Information Gathering Design & Evaluation Finalize Design MYCOBACTERIUM TUBERCULOSIS GENOME SEQUENCING REPORT NOT FOR DIAGNOSTIC USE Pa�ent Name JOHN DOE Barcode Birth Date 2000-01-01 Pa�ent ID 12345678910 Loca�on SOMEPLACE Sample Type SPUTUM Sample Source PULMONARY Sample Date 2016-12-25 Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE Repor�ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36 Requested By REQUESTER NAME Requester Contact REQUESTER@EMAIL.COM Summary The specimen was posi�ve for Mycobacterium tuberculosis . It is resistant to isoniaizd and ri- fampin . It belongs to a cluster, sugges�ng recent transmission . Organism The specimen was posi�ve for Mycobacterium tuberculosis , lineage 2.2.1 ( East-Asian Beijing ). Drug Suscep�bility � No drug resistance predicted Resistance is reported when a high-confidence � Mono-resistance predicted resistance-conferring muta�on is detected. “No muta�on detected” does not exclude the possi- � Mul�-drug resistance predicted � bility of resistance . � Extensive drug resistance predicted Drug class Interpreta�on Drug Resistance Gene (Amino Acid Muta�on) Ethambutol No muta�on detected Suscep�ble Expert TB Task & Data Design Design Choice Pyrazinimide No muta�on detected First Line Isoniazid katG (S315T) Resistant Rifampin rpoB (S531L) Streptomycin No muta�on detected Consults Workflow Questionnaire Sprint Questionnaire Ciprofloxacin No muta�on detected Ofloxacin No muta�on detected Second Line Suscep�ble Moxifloxacin No muta�on detected Amikacin No muta�on detected Map Kanamycin No muta�on detected Data Gathered Capreomycin No muta�on detected Page 1 of 2 Pa�ent ID: 12345678910 | Date: 2017-01-01 | Loca�on: Someplace Qualitative Quantitative Exploratory Sequential Model Embedded Model Study Design
My research : making a clinical report for tuberculosis Consensus DIAGNOSIS TASKS TREATMENT TASKS SURVEILLANCE TASKS among participants Connect Guide Characterize Assess Guide Report to Case to Public WGS Diagnose Diagnose Reactive vs Transmission Choose Choose Tx Response Contact Public Define a Existing Health TOTAL cat. % agree equivalent Latent TB Active TB New Infection Risk Meds Duration to Tx Tracing Health Cluster Cluster Response SCORE Patient Identifier Same 3 3 3 3 3 3 3 2 1 1 1 1 26 3 (>75%) 3 3 2 3 3 3 3 1 1 1 1 1 24 Sample Collection Date Same Patient Prior TB Results Same 3 2 3 3 3 3 3 1 1 1 0 1 23 2 (50% - 25%) Speciation Speciation 1 3 2 3 3 3 3 2 1 1 1 1 23 Sample Type (sputum, fine Same 2 3 2 3 3 3 3 1 1 1 0 1 22 needle aspirate etc.) 1 (25% -50%) Culture results NA 1 3 2 3 3 3 3 2 1 1 0 1 22 Sample Collection Site (lymph Same 2 3 2 3 3 3 3 1 1 0 0 1 21 0 (<25%) node, lung etc..) 2 3 2 3 2 3 3 1 1 1 0 1 21 Acid Fast Bacilli Smear Speciation Resistotype Predicted DST 0 2 3 1 3 3 2 2 1 1 1 1 19 Phenotypic DST Predicted DST 0 2 3 2 3 3 2 1 1 1 0 1 18 3 3 2 3 0 2 3 1 0 0 0 0 17 Chest x-ray NA Data Report Release Date Same 2 2 1 2 2 2 2 1 0 1 0 1 15 Requester IDs Same 2 2 2 2 2 2 2 1 0 0 0 0 15 Interpretation or comments Same 2 2 1 2 2 2 3 1 0 0 0 0 15 from reviewer Predicted DST Predicted DST 0 2 2 1 3 3 2 1 0 1 0 0 15 0 2 3 1 1 1 1 1 1 1 1 1 13 MIRU-VNTR SNPs Cluster Assignment Same 0 2 2 1 1 1 0 1 1 1 1 1 11 SNP/variant distance SNPs 0 1 2 1 1 1 0 1 1 1 1 1 10 0 2 1 1 1 1 0 1 0 1 1 1 9 Phylogenetic Tree Same Reviewer ID Same 1 1 1 1 1 1 1 1 0 0 0 0 8 TST results Speciation* 3 1 1 1 0 0 0 1 0 0 0 0 7 IGRA results Speciation* 3 1 1 1 0 0 0 1 0 0 0 0 7 0 1 2 1 1 1 0 1 0 0 0 0 7 Lab QC WGS Specific Spoligotype SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3 RFLP SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3
My research : making a clinical report for tuberculosis MYCOBACTERIUM TUBERCULOSIS GENOME SEQUENCING REPORT NOT FOR DIAGNOSTIC USE Pa�ent Name JOHN DOE Barcode Birth Date 2000-01-01 Pa�ent ID 12345678910 Loca�on SOMEPLACE Sample Type SPUTUM Sample Source PULMONARY Sample Date 2016-12-25 Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE Repor�ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36 Requested By REQUESTER NAME Requester Contact REQUESTER@EMAIL.COM Summary The specimen was posi�ve for Mycobacterium tuberculosis . It is resistant to isoniaizd and ri- fampin . It belongs to a cluster, sugges�ng recent transmission . Organism The specimen was posi�ve for Mycobacterium tuberculosis , lineage 2.2.1 ( East-Asian Beijing ). Drug Suscep�bility � No drug resistance predicted Resistance is reported when a high-confidence � Mono-resistance predicted resistance-conferring muta�on is detected. “No muta�on detected” does not exclude the possi- � Mul�-drug resistance predicted � bility of resistance . � Extensive drug resistance predicted Drug class Interpreta�on Drug Resistance Gene (Amino Acid Muta�on) Ethambutol No muta�on detected Suscep�ble Pyrazinimide No muta�on detected First Line Isoniazid katG (S315T) Resistant Rifampin rpoB (S531L) Streptomycin No muta�on detected Ciprofloxacin No muta�on detected Ofloxacin No muta�on detected Second Line Suscep�ble Moxifloxacin No muta�on detected Amikacin No muta�on detected Kanamycin No muta�on detected Capreomycin No muta�on detected Page 1 of 2 Pa�ent ID: 12345678910 | Date: 2017-01-01 | Loca�on: Someplace
Thinking Systematically about Data Visualization Da Data Vi Visual + Interaction Do Domain Algori Al rithm thm + Task + sk De Design Ch Choices Problem Pr 4. Explore if other visualizations have addressed this problem and set of tasks & data 5. Implement your own solution (remember this include interaction!) T. Munzner (2014) – Visualization Design and Analysis
Marks & Channels : Basic Building Blocks Ma Mark rk: Basic Graphical Element (basic building block) Ch Channel: Controls the appearance of marks 49 49 T. Munzner (2014) – Visualization Design and Analysis
Marks Vary in their Effectiveness Example Pi Pie Chart Angle & Area Bar Bar Char art Position Common Scale 50 50 J. Heer (2010) – Crowdsourcing Graphical Perception: Using Mechanical Turk ……
Perception and Cognition Matter Too! Original Visualization Visualization as seen by color blind person (color blindness (deuteranopia) impacts men more often)) Colour Blind Simulator:
Perception and Cognition Here too! Colour scales also impact interpretation! Perceptual research from Liu et al (2018) Liu et al. (2018) - Somewhere Over the Rainbow: An Empirical Assessment of Quantitative Colormaps
Marks & Channels : ggplot2 example Channel: Colour Channel: Position ggplot (data = mpg, ae aes( x= x= display, y y = ct cty, co colour = cl class) ) + geom_p _point( ) Mark: Point Note No te: : Generally in ggplot2 aesthetics refer to channels and geoms refer to marks, but there are complex geoms that aren’t simple marks but chart types (i.e. geom_density) and there are aesthetics that have little to do with the visual channels directly (i.e. group) 51 51
Marks & Channels : Tableau example Marks Channels 51 51
Linking Data to Mark and Channels to Make Visualizations Dat Data Ma Marks & Channels Vi Visualization
Linking Data to Mark and Channels to Make Visualizations Data to viz Chart Chooser
Examples from How do people my own research visualize data?
My research: surveying visualizations in genomic epidemiology Crisan et. al (2018) “A systematic method for surveying data visualizations and a resulting genomic epidemiology visualization typology: GEViT” OXFORD BIOINFORMATICS
Examples from How can we help my own research people visualize data?
My research: simplifying the creation of data visualizations #specify individual charts phyloTree_chart <- specify_base (chart_type = "phylogenetic tree",data="tree_dat") epicurve <- specify_base (chart_type = "histogram",data="tab_dat",x = "month") map_chart <- specify_base(" geographic map",data="tab_dat",lat = "latitude",long = "longitude") #specify a combination colour_ combo <- specify_combination (combo_type = "color_linked", base_charts = c("phyloTree_chart","map_chart","epicurve"),link_by="country") #plot the result plot(color_combo )
My research: automatic data visualization Preliminary Result # Analyze different longitude combo_axis_var combo_axis_var − 8 # data types automatically − 10 harmon_obj<-data_harmonization(tab_dat, − 12 tree_dat,genomic_dat,all_spatial) − 14 # Create specifications GIN LBR SLE GIN LBR SLE country country # that compile to minCombinr A B case_count 1000 12 ° N component_specs<-get_spec_list(harmon_obj) 0 30 750 10 ° N 60 count #plot the result one view at a time 90 500 8 ° N plot_view(component_specs,view_num=1) minID 250 GIN 6 ° N LBR 0 SLE GIN LBR SLE 4 ° N 14 ° W 12 ° W 10 ° W 8 ° W country
Thinking Systematically about Data Visualization Da Data Vi Visual + Interaction Do Domain Algori Al rithm thm + Task + sk De Design Ch Choices Problem Pr 4. Explore if other visualizations have addressed this problem and set of tasks 5. Implement your own solution (part or all of that solution could be a new algorithm)
Thinking Systematically about Data Visualization Da Data Visual + Interaction Vi Do Domain Algori Al rithm thm + + Task sk De Design Ch Choices Problem* Pr 6. Test multiple alternatives (including new ones you develop) with stakeholders 7. Gather qualitative & quantitative evaluation data
Thinking Systematically about Data Visualization 1. Identify a relevant pr blem that effects you or a group probl of stakeholders Design data stakeholders use (is it available)? 2. Ask wh what da 3. Ask wh what stake keholde ders do do with the data [ ta tasks ] 4. Explore if other visualizations have addressed this blem and set of ta pr probl tasks & da data ta 5. Implement yo your own wn solution (vis and/or algorithm) 6. Test multiple ltiple alte alternativ atives (including new ones you develop) with stakeholders 7. Gather qu qualita tati tive & qu quanti tita tati tive evaluation data Evaluation
What datavis tools are available?
Data Visualization Tools to Get You Started
Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: I am presenting her figures here
Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: Analysis vs Presentation
Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: Extent of Flexibility How easy/hard it is to make data visualizations (including custom/novel visualizations)
Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: Static vs Interactive
Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: “There are no perfect tools, just good tools for people with certain goals” See a detailed table here:
Tools & Libraries for data visualization Another take with commonly used tools :
Don’t forget that pen and paper is an option too! Dear Data Project (Lupi & Posavec)
Datavis tools for (Microbial) Genomics
IGV Browser for all your genomic needs
The classic UCSC genome browser
GenVisR: Human Genomes in R
Variant Viewer: Human Genomes
Island Viewer: Microbial Genomics
Microreact: Microbial Genomics
GenGIS: Microbial Genomics (Made in Canada!)
Nextstrain: Microbial Genomics
Wrapping up
Key take-aways from this talk Vi Visualizations of data are useful § Helpful in instance of low numeracy § Can used in communication an and exploration § Bu But. t.. visual alizati ation de design gn al also matte atters rs § Many different alternatives, important to test § It It’s p possib ible le t to t thin ink s systemat atic ically ally ab about v vis isualiz alizat atio ions § Many disciplines cross cut information visualization research § At the minimum think “Why”, “What”, “How” § En Encod ode data well so o that ot others can decod ode it later § Da Data ta visualizati tion is a re researc rch pro rocess wi with ope pen and d interesting g pr probl blems ms §
Additional Resources Bo Books to consider: § Interpretable Machine Learning: § Making Data Visual: A Practical Guide to Using Visualization for Insight § by Danyel Fisher and Miriah Meyer Visualization Design and Analysis by Tamara Munzner (more technical ) § On Onlin line resou ources: § Distill Publication : § UBC Infovis Resource Page : § UW Interactive Data Lab : § Data stories podcast : § In Inspiration : § Information is Beautiful : § Visualization WTF (examples of what not to do) : §
Data visualization strategies and tools for microbial genomic epidemiology Anamaria Crisan Vanier Canada Scholar & UBC Public Scholar PhD Candidate, Computer Science University of British Columbia @amcrisan
Part I: Data Visualization Strategies & Tools Pa Part II: A brief (5 min) activity Pa Pa Part III: Data Visualization Research in Practice
How many ways can we visualize these numbers? • In your head, on paper, or computer, sketch out as many examples as you can to visualize the following to numbers: 75 37
How many ways can we visualize these numbers? • In your head, on paper, or computer, sketch out as many examples as you can to visualize the following to numbers: 75 37 example:
More recommend