dimensionality reduction embedding
play

Dimensionality Reduction & Embedding Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2

  2. What will we learn? Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data of x x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 3

  3. Task: Embedding Supervised Learning x 2 Unsupervised Learning embedding Reinforcement x 1 Learning Mike Hughes - Tufts COMP 135 - Spring 2019 4

  4. Dim. Reduction/Embedding Unit Objectives • Goals of dimensionality reduction • Reduce feature vector size (keep signal, discard noise) • “Interpret” features: visualize/explore/understand • Common approaches • Principal Component Analysis (PCA) • t-SNE (“tee-snee”) • word2vec and other neural embeddings • Evaluation Metrics • Storage size - Reconstruction error • “Interpretability” - Prediction error Mike Hughes - Tufts COMP 135 - Spring 2019 5

  5. Example: 2D viz. of movies Mike Hughes - Tufts COMP 135 - Spring 2019 6

  6. Example: Genes vs. geography Mike Hughes - Tufts COMP 135 - Spring 2019 7

  7. Example: Eigen Clothing Mike Hughes - Tufts COMP 135 - Spring 2019 8

  8. Mike Hughes - Tufts COMP 135 - Spring 2019 9

  9. Principal Component Analysis Mike Hughes - Tufts COMP 135 - Spring 2019 10

  10. Linear Projection to 1D Mike Hughes - Tufts COMP 135 - Spring 2019 11

  11. Reconstruction from 1D to 2D Mike Hughes - Tufts COMP 135 - Spring 2019 12

  12. 2D Orthogonal Basis Mike Hughes - Tufts COMP 135 - Spring 2019 13

  13. Which 1D projection is best? Mike Hughes - Tufts COMP 135 - Spring 2019 14

  14. PCA Principles • Minimize reconstruction error • Should be able to recreate x from z • Equivalent to maximizing variance • Want z to retain maximum information Mike Hughes - Tufts COMP 135 - Spring 2019 15

  15. Best Direction related to Eigenvalues of Data Covariance Mike Hughes - Tufts COMP 135 - Spring 2019 16

  16. Principal Component Analysis Training step: .fit() • Input: • X : training data, N x F • N high-dim. example vectors • K : int, number of dimensions to discover • Satisfies 1 <= K <= F • Output: • m : mean vector, size F • V : learned eigenvector basis, K x F • One F-dimensional vector for each component • Each of the K vectors is orthogonal to every other Mike Hughes - Tufts COMP 135 - Spring 2019 17

  17. Principal Component Analysis Transformation step: .transform() • Input: • X : training data, N x F • N high-dim. example vectors • Trained PCA “model” • m : mean vector, size F • V : learned eigenvector basis, K x F • One F-dimensional vector for each component • Each of the K vectors is orthogonal to every other • Output: • Z : projected data, N x K Mike Hughes - Tufts COMP 135 - Spring 2019 18

  18. PCA Demo • http://setosa.io/ev/principal- component-analysis/ Mike Hughes - Tufts COMP 135 - Spring 2019 19

  19. Example: EigenFaces Mike Hughes - Tufts COMP 135 - Spring 2019 20

  20. PCA: How to Select K? • 1) Use downstream supervised task metric • Regression error • 2) Use memory constraints of task • Can’t store more than 50 dims for 1M examples? Take K=50 • 3) Plot cumulative “variance explained” • Take K that seems to capture 90% or all variance Mike Hughes - Tufts COMP 135 - Spring 2019 21

  21. PCA Summary PRO • Usually, fast to train, fast to test • Slow only if finding K eigenvectors of an F x F matrix is slow • Nested model • PCA with K=5 has subset of params equal to PCA with K=4 CON • Learned basis known only up to +/- scaling • Not often best for supervised tasks Mike Hughes - Tufts COMP 135 - Spring 2019 22

  22. Visualization with t-SNE Mike Hughes - Tufts COMP 135 - Spring 2019 23

  23. Credit: Luuk Derksen (https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python- 8ef87e7915b) Mike Hughes - Tufts COMP 135 - Spring 2019 24

  24. Credit: Luuk Derksen (https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python- 8ef87e7915b) Mike Hughes - Tufts COMP 135 - Spring 2019 25

  25. Mike Hughes - Tufts COMP 135 - Spring 2019 26

  26. Practical Tips for t-SNE • If dim is very high, preprocess with PCA to ~30 dims, then apply t-SNE • Beware: Non-convex cost function https://distill.pub/2016/misread-tsne/ Mike Hughes - Tufts COMP 135 - Spring 2019 27

  27. Matrix Factorization as Learned “Embedding” Mike Hughes - Tufts COMP 135 - Spring 2019 28

  28. Matrix Factorization (MF) • User ! represented by vector " # ∈ % & • Item ' represented by vector ( ) ∈ % & * ( ) approximates the utility + #) • Inner product " # • Intuition: • Two items with similar vectors get similar utility scores from the same user; • Two users with similar vectors give similar utility scores to the same item Mike Hughes - Tufts COMP 135 - Spring 2019 29

  29. Mike Hughes - Tufts COMP 135 - Spring 2019 30

  30. Word Embeddings Mike Hughes - Tufts COMP 135 - Spring 2019 31

  31. Word Embeddings (word2vec) Goal: map each word in vocabulary to an embedding vector • Preserve semantic meaning in this new vector space vec(swimming) – vec(swim) + vec(walk) = vec(walking) 32

  32. Word Embeddings (word2vec) Goal: map each word in vocabulary to an embedding vector • Preserve semantic meaning in this new vector space 33

  33. How to embed? Goal: learn weights Training W = Reward embeddings that predict nearby words in the sentence. 7.1 3.2 embedding dimensions typical 100-1000 -4.1 dinosaur s hammer tacos W t a f f fixed vocabulary Credit: typical 1000-100k https://www.tensorflow.org/tutorials/representation/word2vec 34

Recommend


More recommend