CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
High ‐ dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs Surveys – Netflix: 480k users x 177k movies 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Compress / reduce dimensionality: 10 6 rows; 10 3 columns; no updates random access to any cell(s); small error: OK 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 3
Assumption: Data lies on or near a low d ‐ dimensional subspace Axes of this subspace are effective representation of the data 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Why reduce dimensions? Discover hidden correlations/topics Words that occur commonly together Remove redundant and noisy features Not all words are useful Interpretation and visualization Easier storage and processing of the data 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 5
A [m x n] = U [m x r] r x r] ( V [n x r] ) T A : Input data matrix m x n matrix (e.g., m documents, n terms) U : Left singular vectors m x r matrix ( m documents, r concepts) : Singular values r x r diagonal matrix (strength of each ‘concept’) ( r : rank of the matrix A ) V : Right singular vectors n x r matrix ( n terms, r concepts) 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 6
n n V T m m A U 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 7
n 1 u 1 v 1 2 u 2 v 2 + m A σ i … scalar u i … vector v i … vector 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 8
It is always possible to decompose a real matrix A into A = U V T , where U, , V : unique U, V : column orthonormal: U T U = I ; V T V = I ( I : identity matrix) (Cols. are orthogonal unit vectors) : diagonal Entries ( singular values ) are positive, and sorted in decreasing order ( σ 1 σ 2 ... 0) 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 9
A = U V T ‐ example: Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 0.90 0 5 5 5 0 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 10
A = U V T ‐ example: Casablanca SciFi ‐ concept Serenity Amelie Matrix Romance ‐ concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 11
A = U V T ‐ example: U is “user ‐ to ‐ concept” similarity matrix Casablanca SciFi ‐ concept Serenity Amelie Matrix Romance ‐ concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 12
A = U V T ‐ example: Casablanca Serenity Amelie Matrix Alien ‘strength’ of SciFi ‐ concept 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 13
A = U V T ‐ example: V is “movie ‐ to ‐ concept” similarity matrix Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 SciFi ‐ concept 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 14
A = U V T ‐ example: V is “movie ‐ to ‐ concept” similarity matrix Casablanca Serenity Amelie Matrix Alien 0.18 0 1 1 1 0 0 SciFi ‐ concept 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 15
‘ movies ’, ‘ users ’ and ‘ concepts ’: U : user ‐ to ‐ concept similarity matrix V : movie ‐ to ‐ concept sim. matrix : its diagonal elements: ‘strength’ of each concept 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 16
SVD gives best axis Movie 2 rating to project on: ‘best’ = min sum first singular of squares of vector projection errors minimum reconstruction v 1 error Movie 1 rating 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 17
A = U V T ‐ example: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 v 1 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 18
A = U V T ‐ example: variance (‘spread’) on the v 1 axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 19
A = U V T ‐ example: U Gives the coordinates of the points in the projection axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 20
More details Q: How exactly is dim. reduction done? 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 21
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 22
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 23
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 24
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: 0.18 1 1 1 0 0 0.36 2 2 2 0 0 9.64 0.18 1 1 1 0 0 x x ~ 5 5 5 0 0 0.90 0 0 0 2 2 0 A= 0 0 0 3 3 0 0.58 0.58 0.58 0 0 0 0 0 1 1 0 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 25
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero B= 1 1 1 0 0 1 1 1 0 0 Frobenius norm: 2 2 2 0 0 2 2 2 0 0 ǁ M ǁ F = Σ ij M ij 2 1 1 1 0 0 1 1 1 0 0 ~ 5 5 5 0 0 A= 5 5 5 0 0 0 0 0 0 0 0 0 0 2 2 ǁ A-B ǁ F = Σ ij (A ij -B ij ) 2 0 0 0 0 0 0 0 0 3 3 0 0 0 0 0 is “small” 0 0 0 1 1 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 26
Theorem: Let A = U V T ( σ 1 σ 2 …, rank( A )= r ) then B = U S V T S = diagonal n x n matrix where s i = σ i ( i=1…k ) else s i =0 is a best rank ‐ k approximation to A : B is solution to min B ǁ A-B ǁ F where rank( B )= k Σ � �� � �� We will need 2 facts: �� � where M = P Q R is SVD of M � � U V T ‐ U S V T = U ( ‐ S ) V T 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 27
We will need 2 facts: �� � where M = P Q R is SVD of M � � We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal U V T ‐ U S V T = U ( ‐ S ) V T 1/25/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 28
Recommend
More recommend