Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor
Another Factorization? • We’ve already looked at A = LU (for n x n matrices) and A = QR (for m x m matrices) • Both of them are exceedingly useful, but somewhat specialized
Extending QR • We factored A = QR because we wanted an “easy” system to solve for the least squares problem (namely, upper triangular system) • Recall also that when solving n x n systems, we observed that diagonal systems were even easier to solve • Can we come up with a factorization where we only have orthogonal and diagonal matrices?
The SVD • For any m x n matrix A , we can factor it into A = U ∑ V T , where: U : m x m orthogonal matrix V : n x n orthogonal matrix ∑ : m x n diagonal matrix, with ∑ i,i = σ i ≥ 0 σ i ’s are typically ordered so σ i ≥ σ i+1 for i=1...n
Terminology • The σ i s are called the singular values of the matrix A • The columns u i of U are called the left singular vectors of A • The columns v i of V are called the right singular vectors of A
Existence • Note: we have not made any qualifications about A • In particular, A doesn’t need to be full rank, and m can be less than, equal to, or greater than n (works for all sized matrices A ). Essentially, every matrix has an SVD factorization
Uniqueness • SVD is “mostly” unique. Singular values σ i are unique, and singular vectors u i and v i are unique up to choice of sign if σ i s are distinct. If σ i = σ i+1 for some i, then SVD is not unique • Example: identity matrix
How to Compute? • Short answer: in MATLAB, with [U, S, V] = svd(A); • Longer answer: algorithm is beyond the scope of this course • Sufficient to know that factorization exists, we can compute it, and that computing it is more expensive than LU or QR factorization.
What does it mean? • Let A = U ∑ V T . What is Av 1 ( A multiplied by the first column of V )? • V T v 1 = e 1 = [1;0;0;...0] • ∑ e 1 = σ 1 e 1 = [ σ 1 ;0;0;...0] • U σ 1 e 1 = σ 1 Ue 1 = σ 1 u 1 • Extending this shows that Av i = σ i u i for all i
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1 Unit Circle
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 2 v 1 v 1 Multiply by V T (rotates v vectors to axes)
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices Multiply by ∑ (scales 1st axis by σ 1 , second by σ 2 )
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices u 1 u 2 Multiply by U (rotates ellipse to u 1 , u 2 )
What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices u 1 v 2 v 1 u 2 Multiply by U ∑ V T
What does it mean? • In 2D, says that A takes directions v 1 and v 2 , scales them by σ 1 and σ 2 , and then rotates them to u 1 and u 2 • Maps unit circle defined by v 1 and v 2 to ellipse defined by axes u 1 and u 2 • This geometric argument is true in general (but best not to try and imagine it for n > 3!)
Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means that it maps to a subspace of the 2D plane (i.e. a line). This can also be seen to be mapping the unit circle to a degenerate ellipse where one axis has length 0 • In terms of our rotation and scaling operations, this is equivalent to having σ 2 =0 (why σ 2 ?)
Rank Deficiency • Thus, we can use the SVD to determine the rank of a matrix: Compute the SVD, and count the number of singular values > 0 (in practice, need to count number of singular values > some small epsilon to account for floating point issues)
Rank Deficiency • If matrix is rank r ∑ 1 n σ 1 V 1T r ⋱ σ r m n-r U 1 V 2T U 2 r m - r r n-r
Range and Null Space • Suppose v i is a vector in V 2 . What is Av i ? • If v i is in V 2 , then σ i = 0. That means that Av i = σ i u i = 0 • Thus, v i is in the null space of A • This means that V 2 is an orthonormal basis for the null space of A (or null( A ))
Range and Null Space • Similarly, let x be any random vector. Then Ax is a linear combination of the columns of U 1 • This means that U 1 is an orthonormal basis for range( A )
Range and Null Space • Since columns in V 1 are orthonormal to V 2 , and V 2 is an orthonormal basis for null( A ), it follows that V 1 is an orthonormal basis for the orthogonal complement of null( A ) (or null( A ) ⊥ ) • Similarly, columns of U 2 are orthonormal to U 1 , so U 2 is an orthonormal basis for ran( A ) ⊥
Skinny SVD • Just like QR, there are some columns of U and V that we don’t need • Namely, any vector in V 2 gets zeroed out when multiplying by ∑ , and any vector in U 2 is zeroed when multiplying by ∑ • Can reconstruct A using only U 1 , ∑ (1:r, 1:r), and V 1 , where U 1 is m x r and V 1 is n x r
Generalizing Least Squares • We know how to solve least squares as long as A is of full rank • Suppose A is instead of rank r ( σ r+1 = ... = σ n = 0). Can we still find an acceptable least squares solution? • Note: have to combine both overdetermined and underdetermined strategies (may not be an exact solution, and we can add any vector in the null space without changing the answer)
Rank-Deficient Least Squares • ‖ Ax-b ‖ 22 = ‖ U ∑ V T x - b ‖ 22 = U 1 U 2 x ∑ 1 0 V 1T - b 0 0 V 2T = ∑ 1 0 V 1T U 1T V 2T x - b 0 0 U 2T U 2T b = ∑ 1 0 V 1T V 2T x - U 1T b +
Rank Deficient Least Squares • = ∑ 1 0 V 1T V 2T x - U 1T b V 1T y 1 Let y = = and a = U 1T b . Then V 2T x y 2 y 1 ∑ 1 0 - a y 2
Rank Deficient Least Squares • Can choose y 2 to be anything and still satisfy equation. Let it be 0 to minimize norm y 1 ∑ 1 0 - a y 2 Then our solution is simply ∑ 1 y 1 = a and y 2 = 0 . Recall that y 1 = V 1T x and a = U 1T b . Substituting everything, we get x = V 1 ∑ 1-1 U 1T b
Rank Deficient Least Squares • Note that this included both of the observations we made while solving overdetermined and underdetermined systems (ignoring ‖ U 2T b ‖ since there was no way to minimize it, and seeing that we could arbitrarily set part of our solution vector ( V 2T x ) without changing the solution)
Matrix Inverse • Let A = U ∑ V T be an n x n matrix of full rank. Then A -1 = V ∑ -1 U T • What can we observe about this? • Roles of U and V are changed • Singular values of inverse are reciprocals of singular values of A: 1/ σ i (note that they are in reverse order because of the way we order singular values)
Matrix (Pseudo)Inverse • We can generalize this notion of the matrix inverse to come up with the pseudoinverse, which exists for m x n matrices of rank r: A + = V 1 ∑ 1-1 U 1T , where V 1 , ∑ 1 , and U 1 are defined from the skinny SVD • This is in a sense the closest matrix to the inverse for matrices that don’t have an inverse
Matrix (Pseudo)Inverse • Note that when A is n x n and full rank, U = U 1 , V = V 1 , and ∑ = ∑ 1 , so the pseudoinverse is the inverse. • Note that the pseudoinverse is just what we came up with in the general rank-deficient least squares case: the pseudoinverse A + b solves the least squares problem Ax = b
Sidestep: Matrix Norms • When introducing least squares, we introduced the concept of vector norms, which measured size. • There is a corresponding notion of norms for matrices as well, which have similar properties as the vector norms: ‖ A ‖ > 0 if A ≠ 0 ‖ c A ‖ = |c| ‖ A ‖ ‖ A + B ‖ ≤ ‖ A ‖ + ‖ B ‖
Sidestep: Matrix Norms • Frobenius norm of a matrix: ‖ A ‖ F = sqrt(sum(sum( A i,j2 )))
Sidestep: Matrix Norms • If ‖‖ n is some vector norm, then there is a corresponding matrix norm defined as ‖ A ‖ n = max ‖ Ax ‖ n / ‖ x ‖ n x ≠ 0 Note: this does not necessarily give us a rule for how to compute the matrix norm
Sidestep: Matrix Norms • ‖ A ‖ 1 happens to be easy to compute directly: it’s max ‖ A (:,i) ‖ 1 , i.e. the maximum one-norm of the columns of A • ‖ A ‖ 2 unfortunately is rather difficult to compute directly... at least, without our fancy new SVD. Namely, ‖ A ‖ 2 = σ 1
Sidestep: Matrix Norms • Matrix norms are useful for measuring the maximum amount any vector x is scaled by when multiplying by A • We’ll see on Wednesday that they are also useful for measuring error propagation in linear systems
Rank-k Approximation • A = U ∑ V T can be expressed as sum( u i σ i v iT ) from i = 1 ... n • Idea: cut off summation at value k < n • Gives us the rank-k approximation
Recommend
More recommend