compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 11 0 logistics submissions until Sunday 10/13 at midnight with no penalty. Problem Set 2: Bernstein). Will give some review


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 11 0

  2. logistics submissions until Sunday 10/13 at midnight with no penalty. Problem Set 2: Bernstein). Will give some review exercises before midterm. 1 • Problem Set 2 is due this Friday 10/11. Will allow • Midterm next Thursday 10/17. • Mean was a 32 . 74 / 40 = 81 % . • Mostly seem to have mastered Markov’s, Chebyshev, etc. • Some difficulties with exponential tail bounds (Chernoff and

  3. summary Last Two Classes: Randomized Dimensionality Reduction the dataset. principal component analysis. Next Two Classes: Low-rank approximation, the SVD, and oblivious, matrix (linear compression) 2 • The Johnson-Lindenstrauss Lemma ( ) log n /δ ϵ 2 • Reduce n data points in any dimension d to O dimensions and preserve (with probability ≥ 1 − δ ) all pairwise distances up to 1 ± ϵ . • Compression is linear via multiplication with a random, data • Compression is still linear – by applying a matrix. • Chose this matrix carefully, taking into account structure of • Can give better compression than random projection.

  4. embedding with assumptions x j : with no distortion. 3 Assume that data points ⃗ x 1 , . . . ,⃗ x n lie in any k -dimensional subspace V of R d . Recall: Let ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the matrix with these vectors as its columns. For all ⃗ x i ,⃗ ∥ V T ⃗ x i − V T ⃗ x j ∥ 2 = ∥ ⃗ x i − ⃗ x j ∥ 2 . • V T ∈ R k × d is a linear embedding of ⃗ x 1 , . . . ,⃗ x n into k dimensions • An actual projection, analogous to a JL random projection Π .

  5. embedding with assumptions and principal component analysis (PCA). 4 Main Focus of Today: Assume that data points ⃗ x 1 , . . . ,⃗ x n lie close to any k -dimensional subspace V of R d . Letting ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V and V ∈ R d × k be the matrix with these vectors as its columns, V T ⃗ x i ∈ R k is still a good embedding for x i ∈ R d . The key idea behind low-rank approximation • How do we find V and V ? • How good is the embedding?

  6. low-rank factorization v k . 5 Claim: ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V ⇔ the data matrix X ∈ R n × d has rank ≤ k . • Letting ⃗ v 1 , . . . ,⃗ v k be an orthonormal basis for V , can write any ⃗ x i as: ⃗ x i = c i , 1 · ⃗ v 1 + c i , 2 · ⃗ v 2 + . . . + c i , k · ⃗ v k . • So ⃗ v 1 , . . . ,⃗ v k span the rows of X and thus rank ( X ) ≤ k . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  7. 6 c i V T . v k . Claim: ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V ⇔ the data matrix X ∈ R n × d has rank ≤ k . • Every data point ⃗ x i (row of X ) can be written as v k = ⃗ c i , 1 · ⃗ v 1 + . . . + c i , k · ⃗ • X can be represented by ( n + d ) · k parameters vs. n · d . • The columns of X are spanned by k vectors: the columns of C . ⃗ x 1 , . . . ,⃗ x n : data points (in R d ), V : k -dimensional subspace of R d , ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogonal basis for V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  8. low-rank factorization What is this coefficient matrix C ? v k . 7 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace with orthonormal basis V ∈ R d × k , the data matrix can be written as X = CV T . • X = CV T = ⇒ XV = CV T V • V T V = I , the identity (since V is orthonormal) = ⇒ XV = C . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  9. low-rank factorization What is this coefficient matrix C ? v k . 7 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace with orthonormal basis V ∈ R d × k , the data matrix can be written as X = CV T . • X = CV T = ⇒ XV = CV T V • V T V = I , the identity (since V is orthonormal) = ⇒ XV = C . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  10. low-rank factorization What is this coefficient matrix C ? v k . 7 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace with orthonormal basis V ∈ R d × k , the data matrix can be written as X = CV T . • X = CV T = ⇒ XV = CV T V • V T V = I , the identity (since V is orthonormal) = ⇒ XV = C . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  11. projection view v k . 8 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be written as X = X ( VV T ) . • VV T is a projection matrix, which projects the rows of X (the data points ⃗ x 1 , . . . ,⃗ x n onto the subspace V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  12. projection view v k . 8 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be written as X = X ( VV T ) . • VV T is a projection matrix, which projects the rows of X (the data points ⃗ x 1 , . . . ,⃗ x n onto the subspace V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  13. projection view v k . 8 Claim: If ⃗ x 1 , . . . ,⃗ x n lie in a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be written as X = X ( VV T ) . • VV T is a projection matrix, which projects the rows of X (the data points ⃗ x 1 , . . . ,⃗ x n onto the subspace V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  14. low-rank approximation arg min v k . 9 Claim: If ⃗ x 1 , . . . ,⃗ x n lie close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as: X ≈ X ( VV T ) = XP V Note: X ( VV T ) has rank k . It is a low-rank approximation of X . ∑ X ( VV T ) = ∥ X − B ∥ 2 F = ( X i , j − B i , j ) 2 . B with rows in V i , j ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  15. low-rank approximation column span of V ). v k . 10 So Far: If ⃗ x 1 , . . . ,⃗ x n lie close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as: X ≈ X ( VV T ) . This is the closest approximation to X with rows in V (i.e., in the • Letting ( XVV T ) i , ( XVV T ) j be the i th and j th projected data points, ∥ ( XVV T ) i − ( XVV T ) j ∥ 2 = ∥ [( XV ) i − ( XV ) j ] V T ∥ 2 = ∥ [( XV ) i − ( XV ) j ] ∥ 2 . • Can use XV ∈ R n × k as a compressed approximate data set. Key question is how to find the subspace V and correspondingly V . ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

  16. why low-rank approximation? k -dimensional subspace? basis of k vectors. 11 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • The rows of X can be approximately reconstructed from a

  17. why low-rank approximation? k -dimensional subspace? Linearly Dependent Variables: 12 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • Equivalently, the columns of X are approx. spanned by k vectors.

  18. why low-rank approximation? k -dimensional subspace? Linearly Dependent Variables: 12 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • Equivalently, the columns of X are approx. spanned by k vectors.

  19. why low-rank approximation? k -dimensional subspace? Linearly Dependent Variables: 12 Question: Why might we expect ⃗ x 1 , . . . ,⃗ x n to lie close to a • Equivalently, the columns of X are approx. spanned by k vectors.

  20. best fit subspace arg min v k . 2 n 13 If ⃗ x 1 , . . . ,⃗ x n are close to a k -dimensional subspace V with orthonormal basis V ∈ R d × k , the data matrix can be approximated as XVV T . XV gives optimal embedding of X in V . How do we find V (and V )? ( X i , j − ( XVV T ) i , j ) 2 = ∑ ∑ ∥ ⃗ x i − VV T ⃗ orthonormal V ∈ R d × k ∥ X − XVV T ∥ 2 F = x i ∥ 2 i = 1 i , j ⃗ x 1 , . . . ,⃗ x n ∈ R d : data points, X ∈ R n × d : data matrix, ⃗ v 1 , . . . ,⃗ v k ∈ R d : orthogo- nal basis for subspace V . V ∈ R d × k : matrix with columns ⃗ v 1 , . . . ,⃗

Recommend


More recommend