cs 171 visualization
play

CS 171: Visualization High-Dimensional Data Hanspeter Pfister - PowerPoint PPT Presentation

CS 171: Visualization High-Dimensional Data Hanspeter Pfister pfister@seas.harvard.edu This Week PII due Monday, April 8 Friday lab (10-11:30 am, MD G115): Hands-On D3 with Azalea, Sofia, and Billy (our last lab!) Th: Alberto Cairo


  1. CS 171: Visualization High-Dimensional Data Hanspeter Pfister pfister@seas.harvard.edu

  2. This Week • PII due Monday, April 8 • Friday lab (10-11:30 am, MD G115): Hands-On D3 with Azalea, Sofia, and Billy (our last lab!)

  3. Th: Alberto Cairo A Functional Art: Storytelling with Data, Graphs, Maps, and Diagrams

  4. High-Dimensional Data

  5. Item

  6. Attribute

  7. Taxonomy • Based on number of attributes • 1: Univariate • 2: Bivariate • 3: Trivariate • >3: Multivariate

  8. Tableau

  9. Linked Views

  10. Multivariate Plots ggplot2

  11. Multivariate Plots R

  12. Heatmap ggplot2

  13. Hierarchical Heatmap A. Lex

  14. 3D Scatter Plots R, lattice

  15. 3D Continuous Plots R, lattice

  16. Small Multiples Tableau

  17. Small Multiples Protovis

  18. Horizon Graphs

  19. Becker 1996 19

  20. D3

  21. EnRoute A. Lex

  22. Parallel Coordinates

  23. Use more than two axes “Hyperdimensional Data Analysis Using Parallel Coordinates”, Wegman, 1990 Based on slide from Munzner

  24. Parallel Coordinates

  25. Parallel Coordinates

  26. Parallel Coordinates

  27. Correlation “Hyperdimensional Data Analysis Using Parallel Coordinates”, Wegman, 1990 Based on slide from Munzner

  28. Filtering & Brushing D3

  29. Parallel Sets D3

  30. StratomeX A. Lex

  31. Glyphs

  32. Star Plots • Space variables around a circle • Encode values on “spokes” • Data point is now a shape

  33. C. Nussbaumer

  34. Velocity Vorticity (magnitude & (scalar, CW/CCW) direction) Turbulent Charge (vector & scalar) Strain Tensor (second order) M. Kirby, H. Marmanis, and D. Laidlaw

  35. M. Kirby, H. Marmanis, and D. Laidlaw

  36. M. Kirby, H. Marmanis, and D. Laidlaw

  37. M. Kirby, H. Marmanis, and D. Laidlaw

  38. 42 G. Kindlmann 2006

  39. G. Kindlmann 2006

  40. G. Kindlmann 2006

  41. Chernoff Faces

  42. Dimensionality Reduction

  43. What about very high- dimensional data? Based on slide from P . Liang

  44. Basic Idea Project the high-dimensional data onto a lower- dimensional subspace using linear or non-linear transformations y 2 < 10 x 2 < 64 × 64 = < 4096 y = Ux Based on slide from P . Liang

  45. Linear Methods • Does the data lie mostly in a hyperplane? • If so, what is its dimensionality? Based on slide from F. Sha

  46. h"p://www.youtube.com/watch?v=4pnQd6jnCWk

  47. PCA Project data to a subspace such as to maximize the variance of the projected data PC vectors are orthogonal Based on slide from J. Leskovec

  48. MusicBox [Anita Lillie]

  49. Variance and Covariance • Variance: • How far are data points spread? • Covariance: • How much do variables change together

  50. Covariance Matrix x1 x1 x2 x1 1 0 x2 0 1 x2

  51. Covariance Matrix x1 x1 x2 x1 1 0.7 x2 0.7 1 x2

  52. Covariance Matrix x1 x1 x2 x1 1 -­‑0.7 x2 -­‑0.7 1 x2

  53. PCA x1 x2

  54. PCA x1 x2

  55. PCA x1 PC ¡1 x2 PC ¡2

  56. How many PC vectors? Enough PC vectors to cover 80-90% of the variance Screeplot Based on slide from J. Leskovec

  57. PCA for Handwritten Digits HasGe ¡et ¡al.,”The ¡Elements ¡of ¡StaGsGcal ¡Learning: ¡Data ¡Mining, ¡Inference, ¡and ¡PredicGon”, ¡Springer ¡(2009)

  58. Eigenfaces Gunnar ¡Grimnes: h"p:// www.flickr.com/ photos/gromgull/ 3329844591/in/ photostream/

  59. PCA ¡VisualizaGon ¡ • Mondrian ¡painGngs • h"p://www.youtube.com/watch?v=xiWpZ5jhvx4 h"p://www.youtube.com/watch?v=7jLXDyQxck

  60. Gene Array Data First two PC First three PC directions directions

  61. Text Documents >45 features, projected onto two PC dimensions

  62. Multidimensional Scaling (MDS)

  63. Multi-Dimensional Scaling • A ¡different ¡goal ¡: – Find ¡a ¡set ¡of ¡points ¡whose ¡pairwise ¡distances ¡match ¡a ¡ given ¡distance ¡matrix p5 p1 p2 p3 p4 p5 1 1 p1 0 1 2 3 1 1 p1 p2 p2 1 0 2 4 1 2 p3 2 2 0 1 3 2 3 p4 3 4 1 0 1 4 p3 p5 1 1 3 1 0 1 p4

  64. European Cities Data • Distances between European cities:

  65. Result of MDS ! ! Based on slide from T. Yang

  66. Color Images N. Bonneel

  67. Facebook Friends – Distance ¡= ¡1 ¡for ¡friends – Distance ¡= ¡2 ¡for ¡friends ¡of ¡friends ¡; ¡etc. N. Bonneel

  68. IN-SPIRE, PNNL

  69. What if data is non-linear? • Classic “Swiss Roll” example PCA x i Based on slide from F. Sha

  70. Non-Linear Methods • Intuition: Distortion in local areas, but faithful in the global structure Based on slide from F. Sha

  71. Dimensionality Reduction • Linear methods: • Principal Component Analysis (PCA) – Hotelling[33] • Multidimensional Scaling (MDS) – Young[38] • Nonnegative Matrix Factorization (NMF) – Lee[99] • Nonlinear methods: • Locally Linear Embeddings (LLE) – Roweis[00] • IsoMap – Tenenbaum[00]

Recommend


More recommend