cs171 visualization
play

CS171 Visualization Alexander Lex alex@seas.harvard.edu Tables - PowerPoint PPT Presentation

CS171 Visualization Alexander Lex alex@seas.harvard.edu Tables Part II [xkcd] Next Week Reading: VAD, Chapters 9 Lecture 11: Text & Documents Lecture 12: Homework 3 Design Studio Sections: view coordination, linking & brushing


  1. CS171 Visualization Alexander Lex alex@seas.harvard.edu Tables Part II [xkcd]

  2. Next Week Reading: VAD, Chapters 9 Lecture 11: Text & Documents Lecture 12: Homework 3 Design Studio Sections: view coordination, linking & brushing Updates Design Studio moved to Thursday Project Proposal moved to HW 4

  3. Tables & Multi- Dimensional Data

  4. Comparisons

  5. Direction Nicolas Rapp

  6. Plot Change Instead https://eagereyes.org/basics/baselines

  7. Trends Over Time http://xkcd.com/605/

  8. Bars vs. Lines Lines imply connections & 
 sampling from continuous data. Do not use for categorical 
 data. Zacks 1999

  9. Baseline Problem (again) True Baseline Clipped Baseline Plotting Change https://eagereyes.org/basics/baselines

  10. Linear vs. Logarithmic Scale Linear Scale Log Scale http://xkcd.com/1162/ Apple Stock Price http://finance.yahoo.com/echarts?s=AAPL

  11. Aspect Ratios Rule of Thumb: Banking to 45º (average line 
 slope: 45º) eagereyes.org

  12. Correlations

  13. Scatterplots

  14. Overplotting alpha = 1/100

  15. Compositions

  16. Stacked Bar Chart

  17. Comparison of bar chart types Pie Chart Stacked bar chart Layered 
 Bar 
 Chart Small 
 Multiples Grouped 
 Bar 
 Chart Streit & Gehlenborg, PoV, Nature Methods, 2014

  18. Stacked Area Chart http://stackoverflow.com/questions/2225995/how-can-i-create-stacked-line-graph-with-matplotlib

  19. 100% Stacked Area Chart http://stackoverflow.com/questions/16875546/create-a-100-stacked-area-chart-with-matplotlib

  20. Stacked Area vs. Line Graphs leancrew.com & Practically Efficient

  21. Distributions

  22. Histogram # passengers #bins hard to predict make interactive! age rule of thumb: #bins = sqrt(n) 10 Bins # passengers age 20 Bins

  23. Box Plots aka Box-and-Whisker Plot Wikipedia

  24. Comparison Streit & Gehlenborg, PoV, Nature Methods, 2014

  25. Showing Expected Values & Uncertainty Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error Michael Correll, and Michael Gleicher

  26. Highdimensional Data

  27. What is High-dimensional Data? Age Gender Height Tabular data, containing Bob 25 M 181 rows (items) Alice 22 F 185 Chris 19 M 175 columns (attributes or items) rows >> columns

  28. High-Dimensional Data Visualization Homogeneity Same data type? How many dimensions? Same scales? ~50 – tractable with “just” vis ~1000 – need analytical methods Age Gender Height How many records? Bob 25 M 181 ~ 1000 – “just” vis is fine Alice 22 F 185 Chris 19 M 175 >> 10,000 – need analytical methods BPM 1 BPM 2 BPM 3 Bob 65 120 145 Alice 80 135 185 Chris 45 115 135

  29. Analytic Component Multidimensional Scaling Scatterplot Matrices 
 [Doerk 2011] [Bostock] Pixel-based visualizations / 
 heat maps Parallel Coordinates 
 [Bostock] [Chuang 2012] no / little analytics strong analytics 
 component

  30. Geometric Methods

  31. Parallel Coordinates (PC) Inselberg 1985 Axes represent attributes Lines connecting axes represent items X A A B B B A Y X Y

  32. Parallel Coordinates Each axis represents dimension Lines connecting axis represent records Suitable for all tabular data types heterogeneous data

  33. PC Limitation: 
 Scalability to Many Dimensions 500 axes

  34. PC Limitation: Scalability to Many Items Solutions: Transparency Bundling, Clustering Sampling

  35. PC Limitations 
 Correlations only between adjacent axes Solution: Interaction Brushing Let user change order

  36. PC Limitation: 
 Ambiguity Solutions: Brushing Curves Graham and Kennedy 2003

  37. Parallel Coordinates Algorithmic support: Shows primarily relationships between adjacent axis Choosing dimensions Limited scalability (~50 Choosing order dimensions, ~1-5k records) Clustering & aggregating Transparency of lines Interaction is crucial records Axis reordering Brushing Filtering http://bl.ocks.org/jasondavies/1341281

  38. Star Plot [Coekin1969] Similar to parallel coordinates Radiate from a common origin http://www.itl.nist.gov/div898/handbook/eda/section3/starplot.htm http://bl.ocks.org/kevinschaul/raw/8833989/ http://start1.jpl.nasa.gov/caseStudies/autoTool.cfm

  39. Multiple Line Charts http://square.github.io/cubism/

  40. Combining Various Charts

  41. Scatterplot Matrices (SPLOM) Matrix of size d*d Each row/column is one dimension Each cell plots a scatterplot of two dimensions

  42. Scatterplot Matrices Limited scalability (~20 Algorithmic approaches: dimensions, ~500-1k Clustering & aggregating records) records Brushing is important Choosing dimensions Often combined with “Focus Choosing order Scatterplot” as F+C technique

  43. SPLOM Aggregation - Heat Map Datavore: http://vis.stanford.edu/projects/datavore/splom/

  44. SPLOM F+C, Navigation [Elmqvist]

  45. Flexible Linked Axes (FLINA) Claessen & van Wijk 2011

  46. Web-based implementation of 
 FLINA concept http://vis.pku.edu.cn/mddv/val/ ¡

  47. Connected Charts Viau ¡& ¡McGuffin ¡2012 ¡

  48. Domino origin ARTISTS Australia Europe North America studio albums WcountH first album WyearH continent Barbados Rihanna Ireland U2 Sweden ABBA Elton John UK The Beatles number one hits Whitney Houston The Black Eyed Peas Britney Spears start of Eminem US career WyearH Michael Jackson Madonna inactive active Elvis Presley Netherlands career status Germany Australia Sweden Canada France Austria Ireland Span Italy US UK COUNTRIES in business at first album 5 Artists sold albums WabsoluteH gender male group female inactive gender ∩ inactive 5 Countries population WmillionH Artists 0 12 Countries 1 12 Gratzl ¡et ¡al. ¡2014 ¡

  49. Data Reduction Sampling Filtering Don’t show every element, show a Define criteria to remove data, e.g., (random) subset minimum variability > / < / = specific value for one dimension Efficient for large dataset consistency in replicates, … Apply only for display purposes Can be interactive, combined with 
 Outlier-preserving approaches sampling [Ellis & Dix, 2006]

  50. Filter Example http://square.github.io/crossfilter/

  51. Pixel Based Methods

  52. Pixel Based Displays Each cell is a “pixel”, value 
 encoded in color / value Meaning derived from ordering If no ordering inherent, 
 clustering is used Scalable – 1 px per item Good for homogeneous data same scale & type [Gehlenborg & Wong 2012]

  53. 3D Pitfall: Occlusion & Perspective [Gehlenborg and Wong, Nature Methods, 2012]

  54. 3D Pitfall: Occlusion & Perspective [Gehlenborg and Wong, Nature Methods, 2012]

  55. Heterogeneous Data? [Verhaak 2012]

  56. Bad Color Mapping

  57. Good Color Mapping

  58. Color is relative!

  59. Clustering Classification of items into “similar” Hierarchical Algorithms bins Produce “similarity tree” – Based on similarity measures dendrogram Euclidean distance, Pearson Bi-Clustering correlation, ... Clusters dimensions & records Partitional Algorithms divide data into set of bins Fuzzy clustering # bins either manually set (e.g., k- allows occurrence of elements means) or automatically determined in multiples clusters (e.g., affinity propagation)

  60. Clustering Applications Clusters can be used to order (pixel based techniques) brush (geometric techniques) aggregate Aggregation cluster more homogeneous than whole dataset statistical measures, distributions, etc. more meaningful

  61. Clustered Heat Map

  62. F+C Approach, with Dendrograms [Lex, PacificVis 2010]

  63. Cluster Comparison

  64. Aggregation

  65. Design Critique

  66. EdgeMaps: http://goo.gl/q8Cv7t http://mariandoerk.de/edgemaps/demo/#music

  67. Dimensionality Reduction

  68. Dimensionality Reduction Reduce high dimensional to lower dimensional space Preserve as much of variation as possible Plot lower dimensional space Principal Component Analysis (PCA) linear mapping, by order of variance

  69. PCA Example – CS 171 Project 2013 http://mu-8.com/ [Mercer & Pandian]

  70. Multidimensional Scaling Nonlinear, better suited for some DS Popular for text analysis [Doerk 2011]

  71. Can we Trust Dimensionality Reduction? Topical distances between departments in Topical distances between the selected a 2D projection Petroleum Engineering and the others. [Chuang et al., 2012] http://www-nlp.stanford.edu/projects/dissertations/browser.html

Recommend


More recommend