CS 171: Visualization Process & Visual Variables Hanspeter Pfister pfister@seas.harvard.edu
This Week • Friday lab 10:30-11 am in MD G115 • HW1 due today, group reflection due Monday • Readings for next week Chapter 1
Group Reflection • Optional - due on Monday • Not if you are taking late days • Work in groups to improve your HW • Must write a self reflection about improvements • Grade will take both parts into account • Need serious effort on individual part
Learning Catalytics • Go to https://learningcatalytics.com/ • Go to Courses -> CS 171 • Enter session ID: 697815
Survey Results • 213 responses, 31% female, 65% male, 3% N/A • 202 registered, 115 College, 82 DCE, 5 Other
Survey Results
Survey Results
Survey Results
Survey Results
Survey Results
Last Week
Design Excellence “Well-designed presentations of interesting data are a matter of substance, of statistics, and of design.” E. Tufte
Graphical Integrity • Missing scales • Distortions • Lie factor
Washington Post, 2012
Design Principles • Maximize Data-Ink Ratio • Avoid Chartjunk • Increase Data Density • Subjective Dimensions
Graphic Design • C ontrast • R epetition • A lignment • P roximity
Design Critique
Outline • Process • Data Model • Image Model • Psychophysics • Graphical Perception
Process
Reading
Tamara Munzner • Associate Professor at UBC, Canada • Ph.D. Stanford 2000 • Worked at Geometry Center, Compaq Research • Widely published in InfoVis
user-centered design target usability engineering participatory design translate design evaluate implement validate
user-centered design target usability engineering participatory design translate design evaluate implement validate
Miriah Meyer • NSF CI Postdoctoral Fellow at Harvard • Ph.D. Utah 2008 • Works with genomics and molecular biology data
target choose a specific domain define research question(s) translate find & clean the data design implement validate
Pathline - A Tool for Comparative Functional Genomics Data
target translate formulate data analysis tasks exploratory data analysis design transform & summarize data implement validate
Exploratory Data Analysis “The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey
Ascombe’s Quartet Same mean, variance, correlation coefficient, and linear regression line http://upload.wikimedia.org/wikipedia/commons/b/b6/Anscombe.svg
Interactive Exploration • Construct visualization to address questions • Inspect “answers” and pose new questions • Transform the data appropriately • Repeat!
t1 s6 g1 0.2 t1 t2 metabolic s5 gene expression g1 g2 0.2 0.4 1.0 t1 t2 t3 t4 pathways s4 g2 g3 g1 1.0 0.0 -0.7 0.2 0.4 1.0 1 glycolysis t1 t2 t3 t4 t5 s3 g3 g4 g2 -0.7 0.8 1.0 1.0 0.0 0.0 0.0 g1 0.2 0.4 1.0 1.0 1.0 t1 t1 t2 t2 t3 t3 t4 t4 t5 t5 t6 t6 s2 • 6000 genes and g4 g5 g3 1.0 0.0 -0.5 -0.7 0.8 1.0 1 g2 1.0 0.0 0.0 0.0 1.0 g1 g1 0.2 0.2 0.4 0.4 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 • 10 to 50 pathways t1 t2 t3 t4 t5 t6 t1 t2 t3 t4 t5 t6 s1 140 metabolites g5 g6 g4 -0.5 0.8 -0.7 1.0 0.0 0.2 0.5 g3 -0.7 0.8 1.0 1.0 0.8 g2 1.0 0.0 0.0 0.0 1.0 0.8 g1 g1 0.2 0.4 1.0 1.0 1.0 1.0 0.2 0.4 1.0 1.0 1.0 1.0 t1 t2 t3 t4 t5 t6 of interest t1 t2 t3 t4 t5 t6 g6 g7 g5 -0.7 0.5 -1.0 -0.5 0.8 0.5 -0.3 g4 1.0 0.0 0.2 0.5 1.0 g3 -0.7 0.8 1.0 1.0 0.8 0.2 g2 1.0 0.0 0.0 0.0 1.0 0.8 g1 g1 0.2 0.4 1.0 1.0 1.0 1.0 0.2 0.4 1.0 1.0 1.0 1.0 • 6 time points g7 g8 g6 -1.0 -0.3 -0.5 • inputs/outputs -0.7 0.5 0.8 -0.7 g5 -0.5 0.8 0.5 -0.3 -0.5 g4 1.0 0.0 0.2 0.5 1.0 0.2 g3 -0.7 0.8 1.0 1.0 0.8 0.2 m1 1.0 0.0 0.0 0.0 1.0 0.8 g8 g7 -0.5 0.0 -1.0 -0.3 0.4 -1 g6 -0.7 0.5 0.8 -0.7 -1.0 g5 -0.5 0.8 0.5 -0.3 -0.5 -0.5 called metabolites g4 • 14 species of yeast 1.0 0.0 0.2 0.5 1.0 0.2 g2 -0.7 0.8 1.0 1.0 0.8 0.2 tca cycle g8 -0.5 0.0 0.0 -0.7 g7 -1.0 -0.3 0.4 -1.0 -1.0 g6 -0.7 0.5 0.8 -0.7 -1.0 0.5 g5 -0.5 0.8 0.5 -0.3 -0.5 -0.5 m2 1.0 0.0 0.2 0.5 1.0 0.2 • directed graph • 3D table g8 -0.5 0.0 0.0 -0.7 -0.5 g7 -1.0 -0.3 0.4 -1.0 -1.0 -1.0 g6 -0.7 0.5 0.8 -0.7 -1.0 0.5 g3 -0.5 0.8 0.5 -0.3 -0.5 -0.5 g8 -0.5 0.0 0.0 -0.7 -0.5 -0.7 g7 -1.0 -0.3 0.4 -1.0 -1.0 -1.0 m3 -0.7 0.5 0.8 -0.7 -1.0 0.5 g8 • aggregate time series -0.5 0.0 0.0 -0.7 -0.5 -0.7 g7 -1.0 -0.3 0.4 -1.0 -1.0 -1.0 phylogeny similarity scores g8 -0.5 0.0 0.0 -0.7 -0.5 -0.7 for a gene/metabolite S. cer S. mik over species • evolutionary S. bay relationship S. bayuv s1 • similarity of expression C. gla , S. cas across species • binary K. pol s2 aggregate , = 0.83 K. wal tree K. lac • aggregate: Pearson, s3 , S. klu ... Spearman, others D. han C. alb Y. lip • quantitative value S. jap S. pom
target translate design design visual encodings design interactions implement sketch many ideas! validate
Blake Walsh, Gabriel Trevino, Antony Bett
Bang Wong
target translate design implement use code “sketches” define data structures validate find efficient algorithms
target what? 80% translate how? 20% design implement validate
target translate design is the abstraction right? implement does it support the tasks? does it provide new insights? validate
Nested Validation target translate design implement T. Munzner, A Nested Model for Visualization Design and Validation
Process Books
“A methodological approach to visualization development makes effective design decisions salient.” - Miriah Meyer
Data Model
Nominal Categorical Qualitative Ordinal Interval Ratio On the theory of scales and measurements [S. Stevens, 46]
Data Types • Nominal (categorical) (N) Are = or ≠ to other values Apples, Oranges, Bananas,... • Ordinal (ordered) (O) Obey a < relationship Small, medium, large • Quantitative (Q) Can do arithmetic on them 10 inches, 23 inches, etc.
Quantitative • Q - Interval (location of zero arbitrary) Dates: Jan 19; Location: (Lat, Long) Only differences (i.e., intervals) can be compared • Q - Ratio (zero fixed) Measurements: Length, Mass, Temp, ... Origin is meaningful, can measure ratios & proportions On the theory of scales and measurements [S. Stevens, 46]
Item
Attribute
1 = Quantitative 2 = Nominal 3 = Ordinal
1 = Quantitative 2 = Nominal 3 = Ordinal
Nominal /Ordinal = Dimensions Describe the data, independent variables Quantitative = Measures Numbers to be analyzed, dependent variables
Data vs. Conceptual Models • Data Model: Low-level description of the data Set with operations, e.g., floats with +, -, /, * • Conceptual Model: Mental construction Includes semantics, supports reasoning Data Conceptual 1D floats temperature 3D vector of space floats
Data vs. Conceptual Model • From data model... 32.5, 54.0, -17.3, … (floats) • using conceptual model... Temperature • to data type Continuous to 4 significant figures (Q) Hot, warm, cold (O) Burned vs. Not burned (N) Based on slide from Munzner
Image Model
Jacques Bertin • French cartographer [1918-2010] • Semiology of Graphics [1967] • Theoretical principles for visual encodings
Bertin’s Visual Variables Marks Points Lines Areas Channels Position Size (Grey)Value Texture Color Orientation Shape Semiology of Graphics [J. Bertin, 67]
Mapping to Data Types Nominal Ordinal Quantitative Position ✔ ✔ ✔ ~ Size ✔ ✔ ~ (Grey)Value ✔ ✔ ~ Texture ✔ ✖ Color ✔ ✖ ✖ Orientation ✔ ✖ ✖ Shape ✔ ✖ ✖ ✔ = Good ~ = OK ✖ = Bad
Jock Mackinlay, 1986 Decreasing [Mackinlay, Automating the Design of Graphical Presentations of Relational Information, 1986]
Stolte & Hanrahan, 2002 [“Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases” Chris Stolte, Diane Tang, and Pat Hanrahan, 2002]
Psychophysics
Weber’s Law (1795–1878) ∆ I Just-Noticeable Difference = k Weber fraction ( constant! ) Base intensity I • Sensitivity to changes in stimulus decreases when stimulus magnitude increases • True for intensity, length, weight, sound, time, etc.
∆ I = k ∆ I = k I J. H. Krantz
∆ I = k ∆ I = k I J. H. Krantz
∆ I = k ∆ I = k I J. H. Krantz
Fechner’s Law (1801–1887) S = k log( I ) Sensation Intensity • The relationship between stimulus and perception is logarithmic • I.e., we perceive brightness on a logarithmic scale
Based on slide from Mazur
Recommend
More recommend