CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
High-dimension == many features Find concepts/topics/genres: Documents: Features: thousands of words, millions of word pairs Surveys – Netflix: 480k users x 177k movies 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Compress / reduce dimensionality: 10 6 rows; 10 3 columns; no updates random access to any cell(s); small error: OK 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3
Assumption: Data lies on or near a low d -dimensional subspace Axes of this subspace are effective representation of the data 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Why reduce dimensionality? Discover hidden correlations/topics Words that occur commonly together Remove redundant and noisy features Not all words are useful Interpretation and visualization Easier storage and processing of the data 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 5
A [n x m] = U [n x r] Σ [ r x r] ( V [m x r] ) T A : Input data matrix n x m matrix (e.g., n documents, m terms) U : Left singular vectors n x r matrix (n documents, r concepts) Σ : Singular values r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V : Right singular vectos m x r matrix (m terms, r concepts) 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6
n n ≈ Σ V T m m A U 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7
n σ 1 u 1 v 1 σ 2 u 2 v 2 ≈ + m A σ i … scalar u i … vector v i … vector 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 8
It is always possible to decompose a real matrix A into A = U Σ V T , where U, Σ , V : unique U, V : column orthonormal: U T U = I ; V T V = I ( I : identity matrix) (Cols. are orthogonal unit vectors) Σ : diagonal Entries (singular values) are positive, and sorted in decreasing order ( σ 1 ≥ σ 2 ≥ σ 3 ≥ ...) 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 9
A = U Σ V T - example: Casablanca Serenity Amelie Matrix Alien 1 1 1 0 0 0.18 0 2 2 2 0 0 0.36 0 SciFi 1 1 1 0 0 9.64 0 0.18 0 x x = 5 5 5 0 0 0 5.29 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 10
A = U Σ V T - example: Casablanca SciFi-concept Serenity Amelie Matrix Romance-concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 11
A = U Σ V T - example: user-to-concept similarity matrix Casablanca SciFi-concept Serenity Amelie Matrix Romance-concept Alien 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 12
A = U Σ V T - example: Casablanca Serenity Amelie Matrix Alien ‘strength’ of SciFi-concept 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 13
A = U Σ V T - example: movie-to-concept Casablanca similarity matrix Serenity Amelie Matrix Alien 0.18 0 SciFi-concept 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 14
A = U Σ V T - example: movie-to-concept Casablanca similarity matrix Serenity Amelie Matrix Alien 0.18 0 SciFi-concept 1 1 1 0 0 0.36 0 2 2 2 0 0 SciFi 9.64 0 0.18 0 1 1 1 0 0 x x = 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 Romnce 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15
‘movies’, ‘users’ and ‘concepts’: U : user-to-concept similarity matrix V : movie-to-concept sim. matrix Σ : its diagonal elements: ‘strength’ of each concept 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 16
Movie 2 rating SVD gives best axis to project on: ‘best’ = min sum first singular of squares of vector projection errors minimum reconstruction v 1 error Movie 1 rating 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 17
A = U Σ V T - example: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 v 1 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18
A = U Σ V T - example: variance (‘spread’) on the v 1 axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 19
A = U Σ V T - example: U Σ: gives the coordinates of the points in the projection axis 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 20
More details Q: How exactly is dim. reduction done? 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 21
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 = x x 0 5.29 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 22
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 23
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: 0.18 0 1 1 1 0 0 0.36 0 2 2 2 0 0 9.64 0 0.18 0 1 1 1 0 0 x x ~ 0 0 5 5 5 0 0 0.90 0 0 0 0 2 2 0 0.53 A= 0 0 0 3 3 0 0.80 0.58 0.58 0.58 0 0 0 0 0 1 1 0 0.27 0 0 0 0.71 0.71 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 24
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: 0.18 1 1 1 0 0 0.36 2 2 2 0 0 9.64 0.18 1 1 1 0 0 x x ~ 5 5 5 0 0 0.90 0 0 0 2 2 0 A= 0 0 0 3 3 0 0.58 0.58 0.58 0 0 0 0 0 1 1 0 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 25
More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero B= 1 1 1 0 0 1 1 1 0 0 2 2 2 0 0 2 2 2 0 0 1 1 1 0 0 1 1 1 0 0 ~ A= 5 5 5 0 0 5 5 5 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 3 3 0 0 0 0 0 0 0 0 1 1 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 26
Theorem: Let A = U Σ V T ( σ 1 ≥ σ 2 ≥ …, rank(A)=n) then B = U S V T S = diagonal n x n matrix where s i = σ i (i=1…k) else s i =0 is a best rank-k approximation to A: B is solution to min B ǁA -B ǁ F where rank(B)=k Why? n ∑ − = Σ − = σ − 2 min A B min S min ( s ) s i i = F F i B , rank ( B ) k = i 1 k n n ∑ ∑ ∑ = σ − + σ = σ 2 2 2 min ( s ) s i i i i i = = + = + i 1 i k 1 i k 1 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 27
Equivalent: ‘spectral decomposition’ of the matrix: 1 1 1 0 0 2 2 2 0 0 σ 1 1 1 1 0 0 x x = u 1 u 2 5 5 5 0 0 σ 2 0 0 0 2 2 0 0 0 3 3 v 1 0 0 0 1 1 v 2 1/24/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 28
Recommend
More recommend