compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 13 0
• MAP Feedback: • Going to adjust a bit how I take questions in class. • Will try to more clearly identify important information (what will • Will try to use iPad more to write out proofs in class. logistics graduates. We will have your Problem Set 2 and midterm grades back before then. appear on exams or problem sets) v.s. motivating examples. 1 • Pass/Fail Deadline is 10/29 for undergraduates and 10/31 for • Will release Problem Set 3 next week due ∼ 11 / 11.
logistics graduates. We will have your Problem Set 2 and midterm grades back before then. appear on exams or problem sets) v.s. motivating examples. 1 • Pass/Fail Deadline is 10/29 for undergraduates and 10/31 for • Will release Problem Set 3 next week due ∼ 11 / 11. • MAP Feedback: • Going to adjust a bit how I take questions in class. • Will try to more clearly identify important information (what will • Will try to use iPad more to write out proofs in class.
• Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings, • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation – interpret the spectrum of X T X . summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2
• Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings, summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). interpret the spectrum of X T X . This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2 • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation –
• Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings, summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). interpret the spectrum of X T X . This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2 • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation –
summary Last Few Classes: Low-Rank Approximation and PCA k -dimensional subspace. eigenvectors of the covariance matrix X T X (PCA). interpret the spectrum of X T X . This Class: Low-rank approximation and connection to singular value decomposition. value decomposition (SVD) of X . document classification, recommendation systems. 2 • Discussed how to compress a dataset that lies close to a • Optimal compression by projecting onto the top k • Saw how to calculate the error of the approximation – • Show how PCA can be interpreted in terms of the singular • Applications to word embeddings, graph embeddings,
k be the v k be an orthonormal basis for d is the projection matrix onto • VV T X VV T . Gives the closest approximation to X with rows in • X v k . . . d review matrix with these vectors as its columns. d and V Let v 1 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
d is the projection matrix onto • VV T X VV T . Gives the closest approximation to X with rows in • X review v k . . d . matrix with these vectors as its columns. 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
X VV T . Gives the closest approximation to X with rows in • X matrix with these vectors as its columns. v k . . review 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the • VV T ∈ R d × d is the projection matrix onto V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
X VV T . Gives the closest approximation to X with rows in • X matrix with these vectors as its columns. v k . . review 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the • VV T ∈ R d × d is the projection matrix onto V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
review matrix with these vectors as its columns. v k . 3 Set Up: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Let X ∈ R n × d be the data matrix. Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the • VV T ∈ R d × d is the projection matrix onto V . • X ≈ X ( VV T ) . Gives the closest approximation to X with rows in V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
• XVV T is a rank- k matrix – all its rows fall in • X ’s rows are approximately spanned by the columns of V . • X ’s columns are approximately spanned by the columns of XV . review of last time . v k . 4 Low-Rank Approximation: Approximate X ≈ XVV T . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
• X ’s rows are approximately spanned by the columns of V . • X ’s columns are approximately spanned by the columns of XV . review of last time v k . 4 Low-Rank Approximation: Approximate X ≈ XVV T . • XVV T is a rank- k matrix – all its rows fall in V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
v k . review of last time 4 Low-Rank Approximation: Approximate X ≈ XVV T . • XVV T is a rank- k matrix – all its rows fall in V . • X ’s rows are approximately spanned by the columns of V . • X ’s columns are approximately spanned by the columns of XV . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
dual view of low-rank approximation 5
k XVV T 2 k X optimal low-rank approximation d v k . 2 2 VV T x i 1 i n F orthonormal V orthonormal V arg max arg min 6 d XVV T 2 F Given ⃗ x 1 , . . . ,⃗ x n (the rows of X ) we want to find an orthonormal span V ∈ R d × k (spanning a k -dimensional subspace V ). ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
k XVV T 2 optimal low-rank approximation v k . 2 2 VV T x i 1 n F i d arg min orthonormal V 6 F arg max Given ⃗ x 1 , . . . ,⃗ x n (the rows of X ) we want to find an orthonormal span V ∈ R d × k (spanning a k -dimensional subspace V ). orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
optimal low-rank approximation F v k . 2 2 VV T x i i n 1 6 arg min arg max Given ⃗ x 1 , . . . ,⃗ x n (the rows of X ) we want to find an orthonormal span V ∈ R d × k (spanning a k -dimensional subspace V ). orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 F = orthonormal V ∈ R d × k ∥ XVV T ∥ 2 ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗
Recommend
More recommend