spatial data dimensionality reduction
play

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 - PowerPoint PPT Presentation

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think of a data point as a vector in R^n (what could possibly go wrong?) Linear dimensionality reduction: Reduction is achieved by is a single


  1. Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3

  2. In this subfield, we think of a data point as a vector in R^n (what could possibly go wrong?)

  3. “Linear” dimensionality reduction: Reduction is achieved by is a single matrix for every point.

  4. Regular Scatterplots • Every data point is a vector:   v 0 v 1     v 2   v 3 • Every scatterplot is produced by a very simple matrix:  1 � 0 0 0 0 1 0 0  1 � 0 0 0 0 0 1 0

  5. What about other matrices?

  6. Grand Tour (Asimov, 1985) http://cscheid.github.io/lux/demos/tour/tour.html

  7. Is there a best matrix? How do we think about that?

  8. Linear Algebra review • Vectors • Inner Products • Lengths • Angles • Bases • Linear Transformations and Eigenvectors

  9. Principal Component Analysis 0.2 0.1 Species Petal.Length setosa Petal.Width PC2 0.0 versicolor virginica Sepal.Length − 0.1 Sepal.Width − 0.2 − 0.10 − 0.05 0.00 0.05 0.10 0.15 PC1

  10. Principal Component Analysis • Algorithm: • Given data set as matrix X in R^(d x n), ~ 1 1 T ) = XH ˜ • Center matrix: ~ X = X ( I − n X T ˜ • Compute eigendecomposition of ˜ X • X T ˜ ˜ X = U Σ U T • The principal components are the first few rows of U Σ 1 / 2

  11. What if we don’t have coordinates, but distances? “Classical” Multidimensional Scaling

  12. http://www.math.pku.edu.cn/teachers/yaoy/Fall2011/ lecture11.pdf

  13. Borg and Groenen, Modern Multidimensional Scaling

  14. Borg and Groenen, Modern Multidimensional Scaling

  15. “Classical” Multidimensional Scaling • Algorithm: B = − 1 • Given , create D ij = | X i − X j | 2 2 HDH T • PCA of B is equal to the PCA of X • Huh?!

  16. “Nonlinear” dimensionality reduction (ie: projection is not a matrix operation)

  17. Data might have “high- order” structure

  18. http://isomap.stanford.edu/Supplemental_Fig.pdf

  19. We might want to minimize something else besides “di ff erence between squared distances” t-SNE: di ff erence between neighbor ordering Why not distances?

  20. The curse of Dimensionality • High dimensional space looks nothing like low- dimensional space • Most distances become meaningless

Recommend


More recommend