Best Practices in Data Visualization Jodie Jenkinson, Associate Professor + Director Biomedical Communications University of Toronto bmc.med.utoronto.ca Why visualize data? • Because of the inexplicability of complex information or raw data • In order to leverage visual perception • To create an aid to understanding • To provide insight Best Practices in Data Visualization – ComSciComCan Jenkinson
Leveraging visual perception Vision is high bandwidth • 40%+ of cortex devoted to visual perception • Information processing capacity of the visual system • High-bandwidth channel • 109 bits per second* • Parallel processing • = 1 billion bits • = ~120 Megabytes per second * Information Capacity of a Single Retinal Channel, DH Kelly, IRE Transactions on Information Theory, 1962, pp. 221 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Pre-attentive Attend to All visual stimuli Pre-attentive
Leveraging visual perception Vision is high bandwidth • 40%+ of cortex devoted to visual perception • Information processing capacity of the visual system • High-bandwidth channel • 109 bits per second* • Parallel processing • = 1 billion bits • = ~120 Megabytes per second * Information Capacity of a Single Retinal Channel, DH Kelly, IRE Transactions on Information Theory, 1962, pp. 221 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Pre-attentive Attend to All visual stimuli Pre-attentive
Providing insight Why visualize? Anscombe’s quartet Anscombe’s quartet • Same I II III IV I II III IV • Mean (x and y) x y x y x y x y x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 10 8.04 10 9.14 10 7.46 8 6.58 • Variance 8 6.95 8 8.14 8 6.77 8 5.76 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 • Correlation 9 8.81 9 8.77 9 7.11 8 8.84 13 7.58 13 8.74 13 12.74 8 7.71 11 8.33 11 9.26 11 7.81 8 8.47 • Regression 9 8.81 9 8.77 9 7.11 8 8.84 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 11 8.33 11 9.26 11 7.81 8 8.47 4 4.26 4 3.1 4 5.39 19 12.5 14 9.96 14 8.1 14 8.84 8 7.04 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 6 7.24 6 6.13 6 6.08 8 5.25 5 5.68 5 4.74 5 5.73 8 6.89 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson
Why visualize? Anscombe’s quartet I II III IV x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 “…If a picture is only 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 worth a thousand 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 words, we're screwed.” 13 13 13 13 11 11 11 11 9 9 9 9 6 6 6 6 Eric Lander 4 4 4 4 4 8 12 16 20 4 8 12 16 20 4 8 12 16 20 4 8 12 16 20 Professor of Biology, MIT Francis J. Anscombe, Graphs in Statistical Analysis. The American Statistician, vol. 27, no. 1, pp. 17–21, 1973 VIZBI 2011, Opening Remarks Best Practices in Data Visualization – ComSciConCan Jenkinson Data Visualization in a nutshell • Visual mapping (encoding data) • Providing adequate context • Balancing clarity & aesthetics Encoding Data Best Practices in Data Visualization – ComSciComCan Jenkinson
Title Text Data types • Visual representation of data should be consistent with the numerical • Nominal representation • name, type, category • eg. mammals, reptiles, birds • Ordinal • integer sequence • eg. first, second, third • happy, very happy, ecstatic Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Data types Data features • Interval • Spatial • gap in values • eg. maps, GIS, directions • eg. every three months • scalar fields • Ratio • Narrative • real numbers; zero as reference • eg. assembly sequence, process • 45.7 out of 100 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson
Data features Visual encoding • The way in which data is mapped to visual structures 1d • Every visualization can be described as a set of mappings: 2d • from data items to visual marks • from data attributes to visual channels 3d Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson Title Text Marks • Data Items: • Data Marks: the basic visual units that represent data objects visually • Data Attributes: • Visual Channels: the visual variables we can use to represent characteristics of these objects From Enrico Bertini Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson
Allows more accurate judgments Position along a common scale Channels Position along nonaligned scales Cleveland & McGill’s Length Perceptual Task Direction Scale Angle Area Volume Curvature Shading Allows more generic judgments Colour saturation From Enrico Bertini From The Functional Art, Alberto Cairo Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Cleveland and McGill’s Perceptual Task Scale A basic example: the pie chart • Tasks are grouped according to how well you can perceive differences in • People are not good at making visual angular distinctions the data • Pie charts are sometimes rolled out to encode 1 or 2 numbers; usually a very low data density! Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson
Pie charts Pie charts 7% 8% 2002 2003 2004 2002 2003 2004 2005 2006 2007 2005 2006 2007 35% 10% 11% 29% Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Lies, damned lies… • Misalignments between graphic elements and the data they are intended to represent “There are three kinds of lies: lies, damned lies, and statistics” Benjamin Disraeli Best Practices in Data Visualization – ComSciComCan Jenkinson
Misleading use of area Misleading use of area Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Misleading use of area Misleading use of area… Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson
Linear vs. quadratic change Linear vs quadratic change • A common mistake for chart design is to scale an area by two sides at the same time, producing a quadratic effect for a linear change The area of the blue circle is equal to π r2 (20,106) Ti e area of the white square = a 2 r = 80 Ti e area of the white square =6,400 px r = 160 The area of the red circle is equal to π r2 To double the area of white square =12,800 px (80, 424) Ti e area of red square is twice that of white square Ti e area of the greysquare a = 25,600 px 2 √ 12,800 = 113 ...four times that of the white square a = 80 a = 160 a = 113 Best Practices in Data Visualization – ComSciComCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Size encoding How statistics lie • Modifications to the X or Y axis in an attempt to make differences or change appear to be more dramatic • Data represented out of context does not allow for adequate comparison Height Area Volume Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciComCan Jenkinson
The disappearing baseline The disappearing baseline Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Reversing the x-axis Best Practices in Data Visualization – ComSciConCan Jenkinson
Reversing the x-axis No defined y-axis Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson No defined y-axis Context Best Practices in Data Visualization – ComSciConCan Jenkinson
Context • Without context we are unable to see the big picture • Without context we are unable to make meaningful comparisons Source: New York Times Best Practices in Data Visualization – ComSciComCan Jenkinson Source: New York Times Source: New York Times
Data in and out of context Data in and out of context Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson Data in and out of context Historical context Best Practices in Data Visualization – ComSciConCan Jenkinson Best Practices in Data Visualization – ComSciConCan Jenkinson
Recommend
More recommend