http cs246 stanford edu high dimensional many features
play

http://cs246.stanford.edu High dimensional == many features Find - PowerPoint PPT Presentation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu High dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs


  1. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

  2.  High ‐ dimensional == many features  Find concepts/topics/genres:  Documents:  Features: Thousands of words, millions of word pairs  Surveys – Netflix: 480k users x 177k movies 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2

  3.  Compress / reduce dimensionality:  10 6 rows; 10 3 columns; no updates  random access to any cell(s); small error: OK 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

  4.  Assumption: Data lies on or near a low d ‐ dimensional subspace  Axes of this subspace are effective representation of the data 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

  5. Why reduce dimensions?  Discover hidden correlations/topics  Words that occur commonly together  Remove redundant and noisy features  Not all words are useful  Interpretation and visualization  Easier storage and processing of the data 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

  6. A [m x n] = U [m x r]   r x r] ( V [n x r] ) T  A : Input data matrix  m x n matrix (e.g., m documents, n terms)  U : Left singular vectors  m x r matrix ( m documents, r concepts)   : Singular values  r x r diagonal matrix (strength of each ‘concept’) ( r : rank of the matrix A )  V : Right singular vectors  n x r matrix ( n terms, r concepts) 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 6

  7. n n   V T m m A U 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

  8. n  1 u 1 v 1  2 u 2 v 2  + m A σ i … scalar u i … vector v i … vector 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

  9. It is always possible to decompose a real matrix A into A = U  V T , where  U,  , V : unique  U, V : column orthonormal:  U T U = I ; V T V = I ( I : identity matrix)  (Cols. are orthogonal unit vectors)   : diagonal  Entries ( singular values ) are positive, and sorted in decreasing order ( σ 1  σ 2  ...  0) 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

  10.  A = U  V T ‐ example: Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 0.90 0 5 5 5 0 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 10

  11.  A = U  V T ‐ example: Casablanca SciFi ‐ concept Serenity Amelie Matrix Romance ‐ concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

  12.  A = U  V T ‐ example: U is “user ‐ to ‐ concept” similarity matrix Casablanca SciFi ‐ concept Serenity Amelie Matrix Romance ‐ concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

  13.  A = U  V T ‐ example: Casablanca Serenity Amelie Matrix Alien ‘strength’ of SciFi ‐ concept 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

  14.  A = U  V T ‐ example: V is “movie ‐ to ‐ concept” similarity matrix Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 SciFi ‐ concept 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

  15.  A = U  V T ‐ example: V is “movie ‐ to ‐ concept” similarity matrix Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 SciFi ‐ concept 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

  16. ‘ movies ’, ‘ users ’ and ‘ concepts ’:  U : user ‐ to ‐ concept similarity matrix  V : movie ‐ to ‐ concept sim. matrix   : its diagonal elements: ‘strength’ of each concept 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 16

  17. SVD gives best axis Movie 2 rating to project on:  ‘best’ = min sum first singular of squares of vector projection errors  minimum reconstruction v 1 error Movie 1 rating 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

  18.  A = U  V T ‐ example: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 v 1 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

  19.  A = U  V T ‐ example: variance (‘spread’) on the v 1 axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

  20.  A = U  V T ‐ example:  U  Gives the coordinates of the points in the projection axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

  21. More details  Q: How exactly is dim. reduction done? 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 21

  22. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 22

  23. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 23

  24. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 24

  25. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero: 0.18 1 1 1 0 0 0.36 2 2 2 0 0 9.64 0.18 1 1 1 0 0 x x ~ 5 5 5 0 0 0.90 0 0 0 2 2 0 A= 0 0 0 3 3 0 0.58 0.58 0.58 0 0 0 0 0 1 1 0 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 25

  26. More details  Q: How exactly is dim. reduction done?  A: Set the smallest singular values to zero B= 1 1 1 0 0 1 1 1 0 0 Frobenius norm: 2 2 2 0 0 2 2 2 0 0 ǁ M ǁ F = Σ ij M ij 2 1 1 1 0 0 1 1 1 0 0 ~ 5 5 5 0 0 A= 5 5 5 0 0 0 0 0 0 0 0 0 0 2 2 ǁ A-B ǁ F = Σ ij (A ij -B ij ) 2 0 0 0 0 0 0 0 0 3 3 0 0 0 0 0 is “small” 0 0 0 1 1 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 26

  27.  Theorem: Let A = U  V T ( σ 1  σ 2  …, rank( A )= r ) then B = U S V T  S = diagonal n x n matrix where s i = σ i ( i=1…k ) else s i =0 is a best rank ‐ k approximation to A :  B is solution to min B ǁ A-B ǁ F where rank( B )= k Σ � �� � ��  We will need 2 facts: �� �  where M = P Q R is SVD of M � �  U  V T ‐ U S V T = U (  ‐ S ) V T 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

  28.  We will need 2 facts: �� �  where M = P Q R is SVD of M � � We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal  U  V T ‐ U S V T = U (  ‐ S ) V T 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

Recommend


More recommend