Week 1: Tasks and Data, Marks and Channels, Color Tamara Munzner Department of Computer Science University of British Columbia JRNL 520H, Special Topics in Contemporary Journalism: Data Visualization Week 1: 12 September 2017 http://www.cs.ubc.ca/~tmm/courses/journ17
Visualization (vis) defined & motivated Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. Visualization is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods. • human in the loop needs the details –doesn't know exactly what questions to ask in advance –longterm exploratory analysis –presentation of known results –stepping stone towards automation: refining, trustbuilding • intended task, measurable definitions of effectiveness more at: Visualization Analysis and Design, Chapter 1. Munzner. AK Peters Visualization Series, CRC Press, 2014. 2
Why use an external representation? Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. • external representation: replace cognition with perception [Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):1253-1260, 2008.] 3
Why represent all the data? Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. • summaries lose information, details matter –confirm expected and find unexpected patterns –assess validity of statistical model Anscombe’s Quartet Identical statistics x mean 9 x variance 10 y mean 7.5 y variance 3.75 x/y correlation 0.816 https://www.youtube.com/watch?v=DbJyPELmhJc Same Stats, Different Graphs 4
What resource limitations are we faced with? Vis designers must take into account three very different kinds of resource limitations: those of computers, of humans, and of displays. • computational limits –processing time –system memory • human limits –human attention and memory • display limits –pixels are precious resource, the most constrained resource – information density : ratio of space used to encode info vs unused whitespace • tradeoff between clutter and wasting space, find sweet spot between dense and sparse 5
Nested model: Four levels of vis design [A Nested Model of Visualization Design and Validation. • domain situation Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] – who are the target users? • abstraction domain – translate from specifics of domain to vocabulary of vis abstraction • what is shown? data abstraction • why is the user looking at it? task abstraction idiom • idiom algorithm – how is it shown? • visual encoding idiom: how to draw • interaction idiom: how to manipulate [A Multi-Level Typology of Abstract Visualization Tasks Brehmer and Munzner. IEEE TVCG 19(12):2376-2385, • algorithm 2013 (Proc. InfoVis 2013). ] – efficient computation 6
Threats to validity differ at each level Domain situation You misunderstood their needs Data/task abstraction You’re showing them the wrong thing main focus of module Visual encoding/interaction idiom The way you show it doesn’t work Algorithm Your code is too slow [A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 7
Evaluate success at each level with methods from different fields Domain situation problem-driven anthropology/ Observe target users using existing tools design studies ethnography Data/task abstraction Visual encoding/interaction idiom design Justify design with respect to alternatives computer Algorithm technique-driven Measure system time/memory science work Analyze computational complexity cognitive Analyze results qualitatively psychology Measure human time with lab experiment ( lab study ) anthropology/ Observe target users after deployment ( ) ethnography Measure adoption [A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 8
What? Datasets Attributes Data Types What? Attribute Types Categorical Items Attributes Links Positions Grids Data and Dataset Types Why? Ordered Tables Networks & Fields Geometry Clusters, Trees Sets, Lists Ordinal Items Items (nodes) Grids Items Items Positions Attributes Links Positions How? Quantitative Attributes Attributes Dataset Types Ordering Direction Tables Networks Fields (Continuous) Sequential Grid of positions Attributes (columns) Link Items Cell Diverging (rows) Node (item) Attributes (columns) Cell containing value Value in cell Cyclic Trees Multidimensional Table Value in cell Geometry (Spatial) Dataset Availability Static Dynamic Position 9
Three major datatypes Dataset Types Spatial Net Tables Networks Fields (Continuous) Geometry (Spatial) Attributes (columns) Link Items Grid of positions (rows) Node (item) Cell Position Cell containing value Node em) Attributes (columns) Multidimensional Table Trees Value in cell • visualization vs computer graphics Value in cell –geometry is design decision 10
Types: Datasets and data Dataset Types Spatial Net Tables Networks Fields (Continuous) Geometry (Spatial) Attributes (columns) Link Items Grid of positions (rows) Node (item) Cell Position Cell containing value Node em) Attributes (columns) Attribute Types Value in cell Categorical Ordered Ordinal Quantitative 11
Why? Actions Targets All Data Analyze Consume Trends Outliers Features Discover Present Enjoy Attributes Produce One Many Annotate Record Derive Dependency Correlation Similarity Distribution tag Extremes Search • {action, target} pairs Target known Target unknown Location Network Data Lookup Browse – discover distribution known Topology Location Locate Explore – compare trends unknown –l ocate outliers Query Paths – browse topology Identify Compare Summarize What? Spatial Data Why? Shape 12 How?
Actions: Analyze, Query Analyze Consume • analyze Discover Present Enjoy –consume •discover vs present – aka explore vs explain •enjoy Produce Annotate Record Derive – aka casual, social –produce tag •annotate, record, derive • query Query Identify Compare Summarize –how much data matters? • one, some, all • independent choices 13
Derive: Crucial Design Choice • don’t just draw what you’re given! –decide what the right thing to show is –create it with a series of transformations from the original dataset –draw that • one of the four major strategies for handling complexity exports imports trade balance trade balance = exports − imports Derived Data Original Data 14
Targets All Data Network Data Trends Outliers Features Topology Paths Attributes One Many Spatial Data Dependency Correlation Similarity Distribution Shape Extremes 15
How? Encode Manipulate Facet Encode Manipulate Facet Reduce Map Arrange Change Juxtapose Filter from categorical and ordered Express Separate attributes Color Saturation Hue Luminance Select Partition Aggregate Order Align Size, Angle, Curvature, ... Use Navigate Superimpose Embed Shape Motion Direction, Rate, Frequency, ... 16
Encoding visually • analyze idiom structure 17
Definitions: Marks and channels • marks Points Lines Areas – geometric primitives Position Color • channels Horizontal Vertical Both – control appearance of marks Shape Tilt Size Length Area Volume 18
Encoding visually with marks and channels • analyze idiom structure –as combination of marks and channels 1: 2: 3: 4: vertical position vertical position vertical position vertical position horizontal position horizontal position horizontal position color hue color hue size (area) mark: line mark: point mark: point mark: point 19
Channels: Expressiveness types and effectiveness rankings Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region Position on unaligned scale Color hue Length (1D size) Motion Tilt/angle Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) 20
Channels: Rankings Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Spatial region Position on unaligned scale Color hue Length (1D size) Motion Tilt/angle Shape Area (2D size) • effectiveness principle Depth (3D position) –encode most important attributes with highest ranked channels Color luminance • expressiveness principle Color saturation –match channel and data characteristics Curvature Volume (3D size) 21
Accuracy: Fundamental Theory 22
Recommend
More recommend