SVD and PCA Derek Onken and Li Xiong
Feature Extraction Create new features (attributes) by combining/mapping existing ones Common methods Principle Component Analysis Singular Value Decomposition Other compression methods (time-frequency analysis) Fourier transform (e.g. time series) Discrete Wavelet Transform (e.g. 2D images) January 29, 2018 2
Principal Component Analysis (PCA) Principle component analysis: find the dimensions that capture the most variance A linear mapping of the data to a new coordinate system such that the greatest variance lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, and so on. Steps Normalize input data: each attribute falls within the same range Compute k orthonormal (unit) vectors, i.e., principal components - each input data (vector) is a linear combination of the k principal component vectors The principal components are sorted in order of decreasing “significance” Weak components can be eliminated, i.e., those with low variance January 29, 2018 3
Dimensionality Reduction: PCA Mathematically Y Compute the covariance matrix v Find the eigenvectors of the covariance matrix correspond to large eigenvalues X
PCA: Illustrative Example 5
PCA: Illustrative Example 6
PCA: Illustrative Example 7
PCA: Illustrative Example 8
PCA: Illustrative Example 9
Eigen Decomposition How the eigenvalues and eigenvectors create a Matrix decomposition. Q is a matrix consisting of the eigenvectors • Λ is the diagonal matrix containing all the eigenvalues •
Singular Value Decomposition (SVD)
Similarity of Eigen and SVD Columns of Q are eigenvectors Columns of u are left-singular vectors Λ contains eigenvalues Columns of v are right-singular vects Σ contains ordered singular values 𝜏 𝑗 A must be square and we defined A as A=M T M. The v j are eigenvectors of M T M. The u i are eigenvectors of MM T . The eigenvalues are squares of the singular values. ( 𝜇 𝑗 = 𝜏 𝑗 2 )
AN APPLICATION EXAMPLE … ..
FROM:: Dimensionality Reduction: SVD & CUR CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
SVD - Properties It is always possible to decompose a real matrix A into A = U V T , where U, , V : unique U, V : column orthonormal U T U = I ; V T V = I ( I : identity matrix) (Columns are orthogonal unit vectors) : diagonal Entries ( singular values ) are positive, and sorted in decreasing order ( σ 1 σ 2 ... 0 ) Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf Jure Leskovec, Stanford CS246: 1/29/2018 15 Mining Massive Datasets
SVD – Example: Users-to-Movies Consider a matrix. What does SVD do? Casablanca Serenity Amelie Matrix Alien n 1 1 1 0 0 3 3 3 0 0 SciFi V T 4 4 4 0 0 = m 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 Romance U 0 1 0 2 2 “Concepts” AKA Latent dimensions AKA Latent factors Jure Leskovec, Stanford CS246: 1/29/2018 16 Mining Massive Datasets
SVD – Example: Users-to-Movies A = U V T - example: Users to Movies Casablanca Serenity Amelie Matrix Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = x x 0 9.5 0 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 Jure Leskovec, Stanford CS246: 1/29/2018 17 Mining Massive Datasets
SVD – Example: Users-to-Movies A = U V T - example: Users to Movies Casablanca SciFi-concept Serenity Amelie Matrix Romance-concept Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = x x 0 9.5 0 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 Jure Leskovec, Stanford CS246: 1/29/2018 18 Mining Massive Datasets
SVD – Example: Users-to-Movies A = U V T - example: U is “user -to- concept” factor matrix Casablanca Serenity Amelie Matrix Romance-concept SciFi-concept Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = x x 0 9.5 0 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 Jure Leskovec, Stanford CS246: 1/29/2018 19 Mining Massive Datasets
SVD – Example: Users-to-Movies A = U V T - example: Casablanca Serenity Amelie Matrix SciFi-concept Alien “strength” of the SciFi-concept 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = x x 0 9.5 0 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 Jure Leskovec, Stanford CS246: 1/29/2018 20 Mining Massive Datasets
SVD – Example: Users-to-Movies A = U V T - example: Casablanca V is “movie -to- concept” Serenity Amelie Matrix factor matrix SciFi-concept Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = x x 0 9.5 0 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romance 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 SciFi-concept 0.40 -0.80 0.40 0.09 0.09 Jure Leskovec, Stanford CS246: 1/29/2018 21 Mining Massive Datasets
SVD - Interpretation #1 ‘ movies ’, ‘ users ’ and ‘ concepts ’: U : user-to-concept matrix V : movie-to-concept matrix : its diagonal elements: ‘strength’ of each concept Jure Leskovec, Stanford CS246: 1/29/2018 22 Mining Massive Datasets
SVD – Best Low Rank Approx. Fact: SVD gives ‘best’ axis to project on: ‘ best ’ = minimizing the sum of reconstruction errors 2 𝐵 − 𝐶 𝐺 = 𝐵 𝑗𝑘 − 𝐶 𝑗𝑘 Sigma 𝑗𝑘 U A = V T B is best approximation of A: Sigma B U = V T Jure Leskovec, Stanford CS246: 1/29/2018 23 Mining Massive Datasets
Example of SVD
Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Casablanca Serenity Amelie Matrix Alien 1 1 1 0 0 0.13 0.02 -0.01 3 3 3 0 0 0.41 0.07 -0.03 SciFi 12.4 0 0 4 4 4 0 0 0.55 0.09 -0.04 = x x 0 9.5 0 5 5 5 0 0 0.68 0.11 -0.05 0 0 1.3 0 2 0 4 4 0.15 -0.59 0.65 0 0 0 5 5 0.07 -0.73 -0.67 Romnce 0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 Jure Leskovec, Stanford CS246: 0.40 -0.80 0.40 0.09 0.09 1/29/2018 25 Mining Massive Datasets
Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Alien Casablanca q Serenity Amelie Matrix Alien v2 q = 5 0 0 0 0 v1 Project into concept space: Matrix Inner product with each ‘concept’ vector v i Jure Leskovec, Stanford CS246: 1/29/2018 26 Mining Massive Datasets
Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Alien Casablanca q Serenity Amelie Matrix Alien v2 q = 5 0 0 0 0 v1 q*v 1 Project into concept space: Matrix Inner product with each ‘concept’ vector v i Jure Leskovec, Stanford CS246: 1/29/2018 27 Mining Massive Datasets
Case study: How to query? Compactly, we have: q concept = q V E.g.: Casablanca SciFi-concept Serenity Amelie Matrix 0.56 0.12 Alien 0.59 -0.02 = x q = 0.56 0.12 2.8 0.6 5 0 0 0 0 0.09 -0.69 0.09 -0.69 movie-to-concept factors (V) Jure Leskovec, Stanford CS246: 1/29/2018 28 Mining Massive Datasets
Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? d concept = d V E.g.: Casablanca SciFi-concept Serenity Amelie Matrix 0.56 0.12 Alien 0.59 -0.02 = x q = 0.56 0.12 5.2 0.4 0 4 5 0 0 0.09 -0.69 0.09 -0.69 movie-to-concept factors (V) Jure Leskovec, Stanford CS246: 1/29/2018 29 Mining Massive Datasets
Case study: How to query? Observation: User d that rated (‘ Alien ’, ‘ Serenity ’) will be similar to user q that rated (‘ Matrix ’), although d and q have zero ratings in common ! Casablanca Serenity Amelie Matrix SciFi-concept Alien d = 5.2 0.4 0 4 5 0 0 q = 2.8 0.6 5 0 0 0 0 Zero ratings in common Similarity > 0 Jure Leskovec, Stanford CS246: 1/29/2018 30 Mining Massive Datasets
Recommend
More recommend