CS 171: Visualization High-Dimensional Data Hanspeter Pfister pfister@seas.harvard.edu
This Week • PII due Monday, April 8 • Friday lab (10-11:30 am, MD G115): Hands-On D3 with Azalea, Sofia, and Billy (our last lab!)
Th: Alberto Cairo A Functional Art: Storytelling with Data, Graphs, Maps, and Diagrams
High-Dimensional Data
Item
Attribute
Taxonomy • Based on number of attributes • 1: Univariate • 2: Bivariate • 3: Trivariate • >3: Multivariate
Tableau
Linked Views
Multivariate Plots ggplot2
Multivariate Plots R
Heatmap ggplot2
Hierarchical Heatmap A. Lex
3D Scatter Plots R, lattice
3D Continuous Plots R, lattice
Small Multiples Tableau
Small Multiples Protovis
Horizon Graphs
Becker 1996 19
D3
EnRoute A. Lex
Parallel Coordinates
Use more than two axes “Hyperdimensional Data Analysis Using Parallel Coordinates”, Wegman, 1990 Based on slide from Munzner
Parallel Coordinates
Parallel Coordinates
Parallel Coordinates
Correlation “Hyperdimensional Data Analysis Using Parallel Coordinates”, Wegman, 1990 Based on slide from Munzner
Filtering & Brushing D3
Parallel Sets D3
StratomeX A. Lex
Glyphs
Star Plots • Space variables around a circle • Encode values on “spokes” • Data point is now a shape
C. Nussbaumer
Velocity Vorticity (magnitude & (scalar, CW/CCW) direction) Turbulent Charge (vector & scalar) Strain Tensor (second order) M. Kirby, H. Marmanis, and D. Laidlaw
M. Kirby, H. Marmanis, and D. Laidlaw
M. Kirby, H. Marmanis, and D. Laidlaw
M. Kirby, H. Marmanis, and D. Laidlaw
42 G. Kindlmann 2006
G. Kindlmann 2006
G. Kindlmann 2006
Chernoff Faces
Dimensionality Reduction
What about very high- dimensional data? Based on slide from P . Liang
Basic Idea Project the high-dimensional data onto a lower- dimensional subspace using linear or non-linear transformations y 2 < 10 x 2 < 64 × 64 = < 4096 y = Ux Based on slide from P . Liang
Linear Methods • Does the data lie mostly in a hyperplane? • If so, what is its dimensionality? Based on slide from F. Sha
h"p://www.youtube.com/watch?v=4pnQd6jnCWk
PCA Project data to a subspace such as to maximize the variance of the projected data PC vectors are orthogonal Based on slide from J. Leskovec
MusicBox [Anita Lillie]
Variance and Covariance • Variance: • How far are data points spread? • Covariance: • How much do variables change together
Covariance Matrix x1 x1 x2 x1 1 0 x2 0 1 x2
Covariance Matrix x1 x1 x2 x1 1 0.7 x2 0.7 1 x2
Covariance Matrix x1 x1 x2 x1 1 -‑0.7 x2 -‑0.7 1 x2
PCA x1 x2
PCA x1 x2
PCA x1 PC ¡1 x2 PC ¡2
How many PC vectors? Enough PC vectors to cover 80-90% of the variance Screeplot Based on slide from J. Leskovec
PCA for Handwritten Digits HasGe ¡et ¡al.,”The ¡Elements ¡of ¡StaGsGcal ¡Learning: ¡Data ¡Mining, ¡Inference, ¡and ¡PredicGon”, ¡Springer ¡(2009)
Eigenfaces Gunnar ¡Grimnes: h"p:// www.flickr.com/ photos/gromgull/ 3329844591/in/ photostream/
PCA ¡VisualizaGon ¡ • Mondrian ¡painGngs • h"p://www.youtube.com/watch?v=xiWpZ5jhvx4 h"p://www.youtube.com/watch?v=7jLXDyQxck
Gene Array Data First two PC First three PC directions directions
Text Documents >45 features, projected onto two PC dimensions
Multidimensional Scaling (MDS)
Multi-Dimensional Scaling • A ¡different ¡goal ¡: – Find ¡a ¡set ¡of ¡points ¡whose ¡pairwise ¡distances ¡match ¡a ¡ given ¡distance ¡matrix p5 p1 p2 p3 p4 p5 1 1 p1 0 1 2 3 1 1 p1 p2 p2 1 0 2 4 1 2 p3 2 2 0 1 3 2 3 p4 3 4 1 0 1 4 p3 p5 1 1 3 1 0 1 p4
European Cities Data • Distances between European cities:
Result of MDS ! ! Based on slide from T. Yang
Color Images N. Bonneel
Facebook Friends – Distance ¡= ¡1 ¡for ¡friends – Distance ¡= ¡2 ¡for ¡friends ¡of ¡friends ¡; ¡etc. N. Bonneel
IN-SPIRE, PNNL
What if data is non-linear? • Classic “Swiss Roll” example PCA x i Based on slide from F. Sha
Non-Linear Methods • Intuition: Distortion in local areas, but faithful in the global structure Based on slide from F. Sha
Dimensionality Reduction • Linear methods: • Principal Component Analysis (PCA) – Hotelling[33] • Multidimensional Scaling (MDS) – Young[38] • Nonnegative Matrix Factorization (NMF) – Lee[99] • Nonlinear methods: • Locally Linear Embeddings (LLE) – Roweis[00] • IsoMap – Tenenbaum[00]
Recommend
More recommend