201ab quantitative methods visualization
play

201ab Quantitative methods Visualization E D V UL | UCSD Psychology - PowerPoint PPT Presentation

201ab Quantitative methods Visualization E D V UL | UCSD Psychology Visualization failure modes Cool vs informative visualizations Ways graphs can mislead Making a graph pretty ggplot: grammar of graphics E D V UL | UCSD


  1. 201ab Quantitative methods Visualization E D V UL | UCSD Psychology

  2. • Visualization failure modes • Cool vs informative visualizations • Ways graphs can mislead • Making a graph pretty • ggplot: grammar of graphics E D V UL | UCSD Psychology

  3. Entirely made up. E D V UL | UCSD Psychology

  4. Nonsense variables. E D V UL | UCSD Psychology

  5. Graph independent of data. E D V UL | UCSD Psychology

  6. Multiple variables graphed as one. E D V UL | UCSD Psychology

  7. Credit: xkcd E D V UL | UCSD Psychology

  8. Not labeled (or mislabeled). E D V UL | UCSD Psychology

  9. Misleading or useless axis scales. E D V UL | UCSD Psychology

  10. Misleading binning. E D V UL | UCSD Psychology

  11. Illegible E D V UL | UCSD Psychology

  12. Credit: xkcd E D V UL | UCSD Psychology

  13. Visualization failure modes • Completely made up. • Nonsense variables/relationships. • Graph independent of data. • Multiple variables treated as one. • Not labeled, or mislabeled. • Misleading / unusable scales. • Misleading binning. • Illegible. • Crazy mapping from variables -> visual properties. E D V UL | UCSD Psychology

  14. E D V UL | UCSD Psychology

  15. E D V UL | UCSD Psychology

  16. E D V UL | UCSD Psychology

  17. E D V UL | UCSD Psychology

  18. • Visualization failure modes • Cool vs scientific visualizations • Making a graph pretty • ggplot: grammar of graphics • How to graph common data types. E D V UL | UCSD Psychology

  19. E D V UL | UCSD Psychology

  20. From dynamicdiagrams.com E D V UL | UCSD Psychology

  21. From dynamicdiagrams.com E D V UL | UCSD Psychology

  22. From dynamicdiagrams.com E D V UL | UCSD Psychology

  23. From dynamicdiagrams.com E D V UL | UCSD Psychology

  24. This one. This one. - Looks cooler! - Looks a bit more boring - Provides a visual puzzle. - Is much easier to parse and understand - Misrepresents magnitudes. - Accurately, quantitatively represents - Does not adhere to (modern!) convention. magnitudes. - Makes it difficult to make quantitative - Adheres to modern convention comparisons, or extract numbers - Makes it easy to make quantitative comparisons, and extract numbers This is a bad scientific data display This is a good scientific data display But it is a cool visualization But might not be as interesting a visualization E D V UL | UCSD Psychology 24

  25. E D V UL | UCSD Psychology

  26. E D V UL | UCSD Psychology

  27. • Visualization failure modes • Cool vs scientific visualizations • Making a graph pretty • ggplot: grammar of graphics • How to graph common data types. E D V UL | UCSD Psychology

  28. E D V UL | UCSD Psychology

  29. May have gone a bit overboard into “visualization” territory – looks good, but starts violating some conventions: - No Y axis - Y axis label used as title E D V UL | UCSD Psychology

  30. • Visualization failure modes • Cool vs informative visualizations • Making a graph pretty • ggplot: grammar of graphics • Graphs for common types of data. E D V UL | UCSD Psychology

  31. library(ggplot2) Fig <- ggplot(data=..., mapping=aes(...)) + facet_*() + geom_*() + stat_*() + scale_*() + theme*() Basic operation: Take a tidy data frame map variables onto different aesthetic variables (e.g., x, y, color, fill, size, shape, alpha, group). Draw some geom(etric entity) according to that mapping (e.g., point, line, tile, area, ribbon, etc.) E D V UL | UCSD Psychology

  32. E D V UL | UCSD Psychology

  33. E D V UL | UCSD Psychology

  34. • Visualization failure modes • Cool vs informative visualizations • Making a graph pretty • ggplot: grammar of graphics • Graphs for common types of data. • Practice in R. • More exotic graph types / considerations E D V UL | UCSD Psychology

  35. Goal: show how response/dependent variable(s) change with explanatory/independent variable(s). What kind of variables? Categorical? Numerical? Helps to think of it as an abstract formula of sorts, e.g.,: How does height (numerical response) vary across sex (categorical), nationality (categorical), and parents’ income (numerical): numerical ~ 2*categorical + numerical This abstraction helps you pick starting points for graphs. E D V UL | UCSD Psychology

  36. categorical ~ 0 (1 categorical response variable, with 0 explanatory variables) Stacked bar plot Histogram Pie chart + easy-ish comparisons barplot of counts - Hardest comparisons + easy-ish proportion ++ Easiest comparisons ++ easiest proportion - Hardest proportion + socially acceptable pie chart - Waste of ink - Considered tacky. Data: http://vulstats.ucsd.edu/data/spsp.demographics.cleaned.csv E D V UL | UCSD Psychology

  37. categorical ~ 0 (1 categorical response variable, with 0 explanatory variables) Counts: highlight sample size proportions: easier when n is small interpretation. Data: http://vulstats.ucsd.edu/data/spsp.demographics.cleaned.csv E D V UL | UCSD Psychology

  38. numerical ~ 0 (1 numerical response variable, with 0 explanatory variables) Histogram Smoothed density + Portrays noisiness. - Obscures noisiness - Impression sensitive to bins + not too sensitive to reasonable kernel width. Data: http://vulstats.ucsd.edu/data/cal1020.cleaned.Rdata E D V UL | UCSD Psychology

  39. numerical ~ 0 (1 numerical response variable, with 0 explanatory variables) E D V UL | UCSD Psychology

  40. numerical ~ categorical (1 numerical response variable, with 1 categorical explanatory variable) Mean+error Jitter violin boxplot densities Emp CDF (coords flipped) (coords flipped) Easy stat. Useful when Useful when Best when coords not flipped, comparison n is small n is large Best for few categories (<4?). E D V UL | UCSD Psychology

  41. Credit: xkcd E D V UL | UCSD Psychology

  42. numerical ~ categorical (1 numerical response variable, with 1 categorical explanatory variable) – Always put error bars on bar charts (std. error or CI are fine) – Look at rawer data (e.g,. strip charts) before going to more compressed plots. – By removing the solid bar from a bar chart, you can add a good visualization of data distribution. This is better. E D V UL | UCSD Psychology

  43. numerical ~ categorical (my suggestions) With small n: Show all the data points with jitter (here, data are sub- sampled to generate a low n scenario) With large n: Show distribution with violin or density. E D V UL | UCSD Psychology

  44. numerical ~ categorical (eclectic plots, useful with large n, weird distributional differences) Cumulative distribution functions Highlights differences in the tails. Overlayed density/histograms Only useful with really large n With large n can show weird differences. (so tails aren’t just noise). E D V UL | UCSD Psychology

  45. numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) 2 x numerical ~ 0 2D histogram heatmap: Scatterplot: Useless for small n. Best option with small n. Best option with large n. Hard to make legible with large n. E D V UL | UCSD Psychology

  46. numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) Fitted conditional means Conditional means Very rarely should you show these on their This will require binning by x. own, without the raw data. Generally: use method=lm, rather than loess. E D V UL | UCSD Psychology

  47. Credit: xkcd E D V UL | UCSD Psychology

  48. numerical ~ numerical (my recommendation) My recommendation: Show data, show fit. E D V UL | UCSD Psychology

  49. numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) Normalization by x useful when you don’t care about distribution over x. Note: you are unlikely to luxuriate in this much data. E D V UL | UCSD Psychology

  50. numerical ~ numerical + categorical (1 numerical response, with numerical & categorical explanatory variable) Color-coded scatterplot Fitted lines / conditional means. Hard to parse with lots of data. Show error bars. If y is smooth in x, show Note importance of explanatory conditional means (as in here). variable on the x axis! Bin width matters. E D V UL | UCSD Psychology

  51. numerical ~ numerical + categorical (1 numerical response, with numerical & categorical explanatory variable) If scatterplots are important, split into facets with large n. If line comparison is important, keep in same panel. E D V UL | UCSD Psychology

  52. General pointers E D V UL | UCSD Psychology

  53. General pointers • Label your axes. • Follow conventions – Explanatory variable on x axis. – Don’t get creative – respect variable types. – Don’t make visualization puzzles • Convey information clearly, numerically • Represent uncertainty! (distribution, error, confidence) • Be wary of binning artifacts / thresholding • Cool visualizations are not good science graphs E D V UL | UCSD Psychology

Recommend


More recommend