201ab Quantitative methods Visualization E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs informative visualizations • Ways graphs can mislead • Making a graph pretty • ggplot: grammar of graphics E D V UL | UCSD Psychology

Entirely made up. E D V UL | UCSD Psychology

Nonsense variables. E D V UL | UCSD Psychology

Graph independent of data. E D V UL | UCSD Psychology

Multiple variables graphed as one. E D V UL | UCSD Psychology

Credit: xkcd E D V UL | UCSD Psychology

Not labeled (or mislabeled). E D V UL | UCSD Psychology

Misleading or useless axis scales. E D V UL | UCSD Psychology

Misleading binning. E D V UL | UCSD Psychology

Illegible E D V UL | UCSD Psychology

Visualization failure modes • Completely made up. • Nonsense variables/relationships. • Graph independent of data. • Multiple variables treated as one. • Not labeled, or mislabeled. • Misleading / unusable scales. • Misleading binning. • Illegible. • Crazy mapping from variables -> visual properties. E D V UL | UCSD Psychology

E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs scientific visualizations • Making a graph pretty • ggplot: grammar of graphics • How to graph common data types. E D V UL | UCSD Psychology

From dynamicdiagrams.com E D V UL | UCSD Psychology

This one. This one. - Looks cooler! - Looks a bit more boring - Provides a visual puzzle. - Is much easier to parse and understand - Misrepresents magnitudes. - Accurately, quantitatively represents - Does not adhere to (modern!) convention. magnitudes. - Makes it difficult to make quantitative - Adheres to modern convention comparisons, or extract numbers - Makes it easy to make quantitative comparisons, and extract numbers This is a bad scientific data display This is a good scientific data display But it is a cool visualization But might not be as interesting a visualization E D V UL | UCSD Psychology 24

• Visualization failure modes • Cool vs scientific visualizations • Making a graph pretty • ggplot: grammar of graphics • How to graph common data types. E D V UL | UCSD Psychology

May have gone a bit overboard into “visualization” territory – looks good, but starts violating some conventions: - No Y axis - Y axis label used as title E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs informative visualizations • Making a graph pretty • ggplot: grammar of graphics • Graphs for common types of data. E D V UL | UCSD Psychology

library(ggplot2) Fig <- ggplot(data=..., mapping=aes(...)) + facet_*() + geom_*() + stat_*() + scale_*() + theme*() Basic operation: Take a tidy data frame map variables onto different aesthetic variables (e.g., x, y, color, fill, size, shape, alpha, group). Draw some geom(etric entity) according to that mapping (e.g., point, line, tile, area, ribbon, etc.) E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs informative visualizations • Making a graph pretty • ggplot: grammar of graphics • Graphs for common types of data. • Practice in R. • More exotic graph types / considerations E D V UL | UCSD Psychology

Goal: show how response/dependent variable(s) change with explanatory/independent variable(s). What kind of variables? Categorical? Numerical? Helps to think of it as an abstract formula of sorts, e.g.,: How does height (numerical response) vary across sex (categorical), nationality (categorical), and parents’ income (numerical): numerical ~ 2*categorical + numerical This abstraction helps you pick starting points for graphs. E D V UL | UCSD Psychology

categorical ~ 0 (1 categorical response variable, with 0 explanatory variables) Stacked bar plot Histogram Pie chart + easy-ish comparisons barplot of counts - Hardest comparisons + easy-ish proportion ++ Easiest comparisons ++ easiest proportion - Hardest proportion + socially acceptable pie chart - Waste of ink - Considered tacky. Data: http://vulstats.ucsd.edu/data/spsp.demographics.cleaned.csv E D V UL | UCSD Psychology

categorical ~ 0 (1 categorical response variable, with 0 explanatory variables) Counts: highlight sample size proportions: easier when n is small interpretation. Data: http://vulstats.ucsd.edu/data/spsp.demographics.cleaned.csv E D V UL | UCSD Psychology

numerical ~ 0 (1 numerical response variable, with 0 explanatory variables) Histogram Smoothed density + Portrays noisiness. - Obscures noisiness - Impression sensitive to bins + not too sensitive to reasonable kernel width. Data: http://vulstats.ucsd.edu/data/cal1020.cleaned.Rdata E D V UL | UCSD Psychology

numerical ~ 0 (1 numerical response variable, with 0 explanatory variables) E D V UL | UCSD Psychology

numerical ~ categorical (1 numerical response variable, with 1 categorical explanatory variable) Mean+error Jitter violin boxplot densities Emp CDF (coords flipped) (coords flipped) Easy stat. Useful when Useful when Best when coords not flipped, comparison n is small n is large Best for few categories (<4?). E D V UL | UCSD Psychology

numerical ~ categorical (1 numerical response variable, with 1 categorical explanatory variable) – Always put error bars on bar charts (std. error or CI are fine) – Look at rawer data (e.g,. strip charts) before going to more compressed plots. – By removing the solid bar from a bar chart, you can add a good visualization of data distribution. This is better. E D V UL | UCSD Psychology

numerical ~ categorical (my suggestions) With small n: Show all the data points with jitter (here, data are sub- sampled to generate a low n scenario) With large n: Show distribution with violin or density. E D V UL | UCSD Psychology

numerical ~ categorical (eclectic plots, useful with large n, weird distributional differences) Cumulative distribution functions Highlights differences in the tails. Overlayed density/histograms Only useful with really large n With large n can show weird differences. (so tails aren’t just noise). E D V UL | UCSD Psychology

numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) 2 x numerical ~ 0 2D histogram heatmap: Scatterplot: Useless for small n. Best option with small n. Best option with large n. Hard to make legible with large n. E D V UL | UCSD Psychology

numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) Fitted conditional means Conditional means Very rarely should you show these on their This will require binning by x. own, without the raw data. Generally: use method=lm, rather than loess. E D V UL | UCSD Psychology

numerical ~ numerical (my recommendation) My recommendation: Show data, show fit. E D V UL | UCSD Psychology

numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) Normalization by x useful when you don’t care about distribution over x. Note: you are unlikely to luxuriate in this much data. E D V UL | UCSD Psychology

numerical ~ numerical + categorical (1 numerical response, with numerical & categorical explanatory variable) Color-coded scatterplot Fitted lines / conditional means. Hard to parse with lots of data. Show error bars. If y is smooth in x, show Note importance of explanatory conditional means (as in here). variable on the x axis! Bin width matters. E D V UL | UCSD Psychology

numerical ~ numerical + categorical (1 numerical response, with numerical & categorical explanatory variable) If scatterplots are important, split into facets with large n. If line comparison is important, keep in same panel. E D V UL | UCSD Psychology

General pointers E D V UL | UCSD Psychology

General pointers • Label your axes. • Follow conventions – Explanatory variable on x axis. – Don’t get creative – respect variable types. – Don’t make visualization puzzles • Convey information clearly, numerically • Represent uncertainty! (distribution, error, confidence) • Be wary of binning artifacts / thresholding • Cool visualizations are not good science graphs E D V UL | UCSD Psychology

201ab Quantitative methods Visualization E D V UL | UCSD Psychology - PowerPoint PPT Presentation

201ab Quantitative methods Visualization E D V UL | UCSD Psychology Visualization failure modes Cool vs informative visualizations Ways graphs can mislead Making a graph pretty ggplot: grammar of graphics E D V UL | UCSD

201ab Quantitative methods L.12 Linear model: Categorical predictors E D V UL | UCSD Psychology

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D V UL | UCSD Psychology

201ab Quantitative methods Linear model diagnostics. Model assumptions, in order of importance

201ab Quantitative methods Multiple regression (b) With great illustrations from Julian Parris. E

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

201ab Quantitative methods ANCOVA E D V UL | UCSD Psychology What does ANCOVA do? In an ANOVA ,

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

Quantitative Quantitative Quantitative Quantitative Modal Modal Transition Transition

Visualization Visualization Understand what ConvNets learn 2 Visualization The development of

Data Visualization Brait ispuu Types of Visualization Mathematical Visualization y =

Visualization CS 299 Introduction to Data Science Overview 1. What Is Visualization? 2.

Visualization Systems 11-1 Ronald Peikert SciVis 2008 - Visualization Systems Modular

Data Visualization Tools, How do you make a visualization? Is it the right visualization?

Why not Quantitative Methods? Why not Quantitative Methods? division into variables:

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Circular Security for Symmetric Key Bit Encryption from LWE Rishab Goyal Venkata Koppula Brent

Verifying Object Construction How to use the builder pattern with the type safety of constructors

Hybrid Systems Modeling, Analysis and Control Radu Grosu Vienna University of Technology Lecture

Types of Correspondence Problems and Data Sets 1 1 Correspondence Registration 2

Graphics Writing Functions Marco Chiarandini Department of Mathematics & Computer Science

WS Calibration Program and GUI in python 18/11/2014 Carolina Bianchini BE-BI-BL Outline Wire

Preliminary Results of LIGO-ALLEGRO Stochastic Background Search John T. Whelan

PyParadise Developed by: Bernd Husemann (MPIA), Omar Choudhury (AIP) C. Jakob Walcher