http cs246 stanford edu high dimension many features find
play

http://cs246.stanford.edu High-dimension == many features Find - PowerPoint PPT Presentation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu High-dimension == many features Find concepts/topics/genres: Documents: Features: thousands of words, millions of word pairs Surveys


  1. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

  2.  High-dimension == many features  Find concepts/topics/genres:  Documents:  Features: thousands of words, millions of word pairs  Surveys – Netflix: 480k users x 177k movies 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2

  3.  Compress / reduce dimensionality:  10 6 rows; 10 3 columns; no updates  random access to any cell(s); small error: OK 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

  4.  Assumption: Data lies on or near a low d -dimensional subspace  Axes of this subspace are effective representation of the data 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

  5.  Why reduce dimensionality?  Discover hidden correlations/topics  Words that occur commonly together  Remove redundant and noisy features  Not all words are useful  Interpretation and visualization  Easier storage and processing of the data 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

  6. A [n x m] = U [n x r] Σ [ r x r] ( V [m x r] ) T  A : Input data matrix  n x m matrix (e.g., n documents, m terms)  U : Left singular vectors  n x r matrix (n documents, r concepts)  Σ : Singular values  r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)  V : Right singular vectos  m x r matrix (m terms, r concepts) 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6

  7. n n ≈ Σ V T m m A U 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

  8. n σ 1 u 1 v 1 σ 2 u 2 v 2 ≈ + m A σ i … scalar u i … vector v i … vector 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

  9. It is always possible to decompose a real matrix A into A = U Σ V T , where  U, Σ , V : unique  U, V : column orthonormal:  U T U = I ; V T V = I ( I : identity matrix)  (Cols. are orthogonal unit vectors)  Σ : diagonal  Entries (singular values) are positive, and sorted in decreasing order ( σ 1 ≥ σ 2 ≥ σ 3 ≥ ...) 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

  10.  A = U Σ V T - example: Casablanca Serenity Amelie Matrix Alien 1 1 1 0 0 0.18 0 2 2 2 0 0 0.36 0 SciFi 1 1 1 0 0 9.64 0 0.18 0 x x = 5 5 5 0 0 0 5.29 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 10

  11.  A = U Σ V T - example: Casablanca SciFi-concept Serenity Amelie Matrix Romance-concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

  12.  A = U Σ V T - example: user-to-concept similarity matrix Casablanca SciFi-concept Serenity Amelie Matrix Romance-concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

  13.  A = U Σ V T - example: Casablanca Serenity Amelie Matrix Alien ‘strength’ of SciFi-concept 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

  14.  A = U Σ V T - example: movie-to-concept Casablanca similarity matrix Serenity Amelie Matrix Alien 0.18 0 SciFi-concept 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

  15.  A = U Σ V T - example: movie-to-concept Casablanca similarity matrix Serenity Amelie Matrix Alien 0.18 0 SciFi-concept 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

  16. ‘movies’, ‘users’ and ‘concepts’:  U : user-to-concept similarity matrix  V : movie-to-concept sim. matrix  Σ : its diagonal elements: ‘strength’ of each concept 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 16

  17. Movie 2 rating SVD gives best axis to project on:  ‘best’ = min sum first singular of squares of vector projection errors  minimum reconstruction v 1 error Movie 1 rating 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

  18.  A = U Σ V T - example: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 v 1 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

  19.  A = U Σ V T - example: variance (‘spread’) on the v 1 axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

  20.  A = U Σ V T - example:  U Σ: gives the coordinates of the points in the projection axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

  21. More details  Q: How exactly is dim. reduction done? 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 21

  22. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 22

  23. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 23

  24. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 24

  25. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero: 0.18 1 1 1 0 0 0.36 2 2 2 0 0 9.64 0.18 1 1 1 0 0 x x ~ 5 5 5 0 0 0.90 0 0 0 2 2 0 A= 0 0 0 3 3 0 0.58 0.58 0.58 0 0 0 0 0 1 1 0 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 25

  26. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero B= 1 1 1 0 0 1 1 1 0 0 2 2 2 0 0 2 2 2 0 0 1 1 1 0 0 1 1 1 0 0 ~ A= 5 5 5 0 0 5 5 5 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 3 3 0 0 0 0 0 0 0 0 1 1 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 26

  27.  Theorem: Let A = U Σ V T ( σ 1 ≥ σ 2 ≥ …, rank(A)=n) then B = U S V T  S = diagonal n x n matrix where s i = σ i (i=1…k) else s i =0 is a best rank-k approximation to A:  B is solution to min B ǁA -B ǁ F where rank(B)=k  Why? n ∑ − = Σ − = σ − 2 min A B min S min ( s ) s i i = F F i B , rank ( B ) k = i 1 k n n ∑ ∑ ∑ = σ − + σ = σ 2 2 2 min ( s ) s i i i i i = = + = + i 1 i k 1 i k 1 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

  28. Equivalent: ‘spectral decomposition’ of the matrix: 1 1 1 0 0 2 2 2 0 0 σ 1 1 1 1 0 0 x x = u 1 u 2 5 5 5 0 0 σ 2 0 0 0 2 2 0 0 0 3 3 v 1 0 0 0 1 1 v 2 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

Recommend


More recommend