Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor

Another Factorization? • We’ve already looked at A = LU (for n x n matrices) and A = QR (for m x m matrices) • Both of them are exceedingly useful, but somewhat specialized

Extending QR • We factored A = QR because we wanted an “easy” system to solve for the least squares problem (namely, upper triangular system) • Recall also that when solving n x n systems, we observed that diagonal systems were even easier to solve • Can we come up with a factorization where we only have orthogonal and diagonal matrices?

The SVD • For any m x n matrix A , we can factor it into A = U ∑ V T , where: U : m x m orthogonal matrix V : n x n orthogonal matrix ∑ : m x n diagonal matrix, with ∑ i,i = σ i ≥ 0 σ i ’s are typically ordered so σ i ≥ σ i+1 for i=1...n

Terminology • The σ i s are called the singular values of the matrix A • The columns u i of U are called the left singular vectors of A • The columns v i of V are called the right singular vectors of A

Existence • Note: we have not made any qualifications about A • In particular, A doesn’t need to be full rank, and m can be less than, equal to, or greater than n (works for all sized matrices A ). Essentially, every matrix has an SVD factorization

Uniqueness • SVD is “mostly” unique. Singular values σ i are unique, and singular vectors u i and v i are unique up to choice of sign if σ i s are distinct. If σ i = σ i+1 for some i, then SVD is not unique • Example: identity matrix

How to Compute? • Short answer: in MATLAB, with [U, S, V] = svd(A); • Longer answer: algorithm is beyond the scope of this course • Sufficient to know that factorization exists, we can compute it, and that computing it is more expensive than LU or QR factorization.

What does it mean? • Let A = U ∑ V T . What is Av 1 ( A multiplied by the first column of V )? • V T v 1 = e 1 = [1;0;0;...0] • ∑ e 1 = σ 1 e 1 = [ σ 1 ;0;0;...0] • U σ 1 e 1 = σ 1 Ue 1 = σ 1 u 1 • Extending this shows that Av i = σ i u i for all i

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1 Unit Circle

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 2 v 1 v 1 Multiply by V T (rotates v vectors to axes)

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices Multiply by ∑ (scales 1st axis by σ 1 , second by σ 2 )

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices u 1 u 2 Multiply by U (rotates ellipse to u 1 , u 2 )

What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices u 1 v 2 v 1 u 2 Multiply by U ∑ V T

What does it mean? • In 2D, says that A takes directions v 1 and v 2 , scales them by σ 1 and σ 2 , and then rotates them to u 1 and u 2 • Maps unit circle defined by v 1 and v 2 to ellipse defined by axes u 1 and u 2 • This geometric argument is true in general (but best not to try and imagine it for n > 3!)

Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means that it maps to a subspace of the 2D plane (i.e. a line). This can also be seen to be mapping the unit circle to a degenerate ellipse where one axis has length 0 • In terms of our rotation and scaling operations, this is equivalent to having σ 2 =0 (why σ 2 ?)

Rank Deficiency • Thus, we can use the SVD to determine the rank of a matrix: Compute the SVD, and count the number of singular values > 0 (in practice, need to count number of singular values > some small epsilon to account for floating point issues)

Rank Deficiency • If matrix is rank r ∑ 1 n σ 1 V 1T r ⋱ σ r m n-r U 1 V 2T U 2 r m - r r n-r

Range and Null Space • Suppose v i is a vector in V 2 . What is Av i ? • If v i is in V 2 , then σ i = 0. That means that Av i = σ i u i = 0 • Thus, v i is in the null space of A • This means that V 2 is an orthonormal basis for the null space of A (or null( A ))

Range and Null Space • Similarly, let x be any random vector. Then Ax is a linear combination of the columns of U 1 • This means that U 1 is an orthonormal basis for range( A )

Range and Null Space • Since columns in V 1 are orthonormal to V 2 , and V 2 is an orthonormal basis for null( A ), it follows that V 1 is an orthonormal basis for the orthogonal complement of null( A ) (or null( A ) ⊥ ) • Similarly, columns of U 2 are orthonormal to U 1 , so U 2 is an orthonormal basis for ran( A ) ⊥

Skinny SVD • Just like QR, there are some columns of U and V that we don’t need • Namely, any vector in V 2 gets zeroed out when multiplying by ∑ , and any vector in U 2 is zeroed when multiplying by ∑ • Can reconstruct A using only U 1 , ∑ (1:r, 1:r), and V 1 , where U 1 is m x r and V 1 is n x r

Generalizing Least Squares • We know how to solve least squares as long as A is of full rank • Suppose A is instead of rank r ( σ r+1 = ... = σ n = 0). Can we still find an acceptable least squares solution? • Note: have to combine both overdetermined and underdetermined strategies (may not be an exact solution, and we can add any vector in the null space without changing the answer)

Rank-Deficient Least Squares • ‖ Ax-b ‖ 22 = ‖ U ∑ V T x - b ‖ 22 = U 1 U 2 x ∑ 1 0 V 1T - b 0 0 V 2T = ∑ 1 0 V 1T U 1T V 2T x - b 0 0 U 2T U 2T b = ∑ 1 0 V 1T V 2T x - U 1T b +

Rank Deficient Least Squares • = ∑ 1 0 V 1T V 2T x - U 1T b V 1T y 1 Let y = = and a = U 1T b . Then V 2T x y 2 y 1 ∑ 1 0 - a y 2

Rank Deficient Least Squares • Can choose y 2 to be anything and still satisfy equation. Let it be 0 to minimize norm y 1 ∑ 1 0 - a y 2 Then our solution is simply ∑ 1 y 1 = a and y 2 = 0 . Recall that y 1 = V 1T x and a = U 1T b . Substituting everything, we get x = V 1 ∑ 1-1 U 1T b

Rank Deficient Least Squares • Note that this included both of the observations we made while solving overdetermined and underdetermined systems (ignoring ‖ U 2T b ‖ since there was no way to minimize it, and seeing that we could arbitrarily set part of our solution vector ( V 2T x ) without changing the solution)

Matrix Inverse • Let A = U ∑ V T be an n x n matrix of full rank. Then A -1 = V ∑ -1 U T • What can we observe about this? • Roles of U and V are changed • Singular values of inverse are reciprocals of singular values of A: 1/ σ i (note that they are in reverse order because of the way we order singular values)

Matrix (Pseudo)Inverse • We can generalize this notion of the matrix inverse to come up with the pseudoinverse, which exists for m x n matrices of rank r: A + = V 1 ∑ 1-1 U 1T , where V 1 , ∑ 1 , and U 1 are defined from the skinny SVD • This is in a sense the closest matrix to the inverse for matrices that don’t have an inverse

Matrix (Pseudo)Inverse • Note that when A is n x n and full rank, U = U 1 , V = V 1 , and ∑ = ∑ 1 , so the pseudoinverse is the inverse. • Note that the pseudoinverse is just what we came up with in the general rank-deficient least squares case: the pseudoinverse A + b solves the least squares problem Ax = b

Sidestep: Matrix Norms • When introducing least squares, we introduced the concept of vector norms, which measured size. • There is a corresponding notion of norms for matrices as well, which have similar properties as the vector norms: ‖ A ‖ > 0 if A ≠ 0 ‖ c A ‖ = |c| ‖ A ‖ ‖ A + B ‖ ≤ ‖ A ‖ + ‖ B ‖

Sidestep: Matrix Norms • Frobenius norm of a matrix: ‖ A ‖ F = sqrt(sum(sum( A i,j2 )))

Sidestep: Matrix Norms • If ‖‖ n is some vector norm, then there is a corresponding matrix norm defined as ‖ A ‖ n = max ‖ Ax ‖ n / ‖ x ‖ n x ≠ 0 Note: this does not necessarily give us a rule for how to compute the matrix norm

Sidestep: Matrix Norms • ‖ A ‖ 1 happens to be easy to compute directly: it’s max ‖ A (:,i) ‖ 1 , i.e. the maximum one-norm of the columns of A • ‖ A ‖ 2 unfortunately is rather difficult to compute directly... at least, without our fancy new SVD. Namely, ‖ A ‖ 2 = σ 1

Sidestep: Matrix Norms • Matrix norms are useful for measuring the maximum amount any vector x is scaled by when multiplying by A • We’ll see on Wednesday that they are also useful for measuring error propagation in linear systems

Rank-k Approximation • A = U ∑ V T can be expressed as sum( u i σ i v iT ) from i = 1 ... n • Idea: cut off summation at value k < n • Gives us the rank-k approximation

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor - PowerPoint PPT Presentation

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor Another Factorization? Weve already looked at A = LU (for n x n matrices) and A = QR (for m x m matrices) Both of them are exceedingly useful, but somewhat specialized

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael

CS475 / CS675 Lecture 19: July 5, 2016 Singular value decomposition Reading: [TB] Chapter 31

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

/ Link Invariants from Braided Monoidal On the PROB of Singular Braids Categories Singular

Descriptive and combinatorial set theory Introduction Singular cardinals, at singular cardinals

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

Covariance Matrix Simplification For Efficient Uncertainty Management Andr Jalobeanu, Jorge A.

Learning and Sophistication in Coordination Games Kyle Hyndman 1 Antoine Terracol 2 Jonathan

The Sharing Economy for Smart Grid Chenye Wu IIIS, Tsinghua University December 20, 2016 Chenye

Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of

Eigenvectors of some large sample covariance matrix ensembles Random Matrix Workshop, Tlcom

Cross-section systematics / inputs for VALOR DUNE ND analysis Costas Andreopoulos, Steve

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional