Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1
Organizational aspects ◮ Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December 18. ◮ Exercises: Tuesday 8-10, MA A110. First: September 25, Last: December 18. ◮ Exam: Miniproject + oral exam. ◮ Webpage: https://anchp.epfl.ch/lowrank . ◮ daniel.kressner@epfl.ch , lana.perisa@epfl.ch 2
From http://www.niemanlab.org ... his [Aleksandr Kogan’s] message went on to confirm that his approach was indeed similar to SVD or other matrix factorization meth- ods, like in the Netflix Prize competition, and the Kosinki-Stillwell- Graepel Facebook model. Dimensionality reduction of Facebook data was the core of his model. 3
Rank and basic properties For field F , let A ∈ F m × n . Then rank ( A ) := dim ( range ( A )) . For simplicity, F = R throughout the lecture and often m ≥ n . Lemma Let A ∈ R m × n . Then 1. rank ( A T ) = rank ( A ) ; 2. rank ( PAQ ) = rank ( A ) for invertible matrices P ∈ R m × m , Q ∈ R n × n ; 3. rank ( AB ) ≤ min { rank ( A ) , rank ( B ) } for any matrix B ∈ R n × p . �� A 11 �� A 12 = rank ( A 11 ) + rank ( A 22 ) for A 11 ∈ R m 1 × n 1 , 4. rank 0 A 22 A 12 ∈ R m 1 × n 2 , A 22 ∈ R m 2 × n 2 . Proof: See Linear Algebra 1 / Exercises. 4
Rank and matrix factorizations Let B = { b 1 , . . . , b r } ⊂ R m with r = rank ( A ) be basis of range ( A ) . � a 1 , a 2 , . . . , a n � Then each of the columns of A = can be expressed as linear combination of B : c i 1 . � b 1 , . . . , b r � . a i = b 1 c i 1 + b 2 c i 2 + · · · + b r c ir = , . c ir for some coefficients c ij ∈ R with i = 1 , . . . , n , j = 1 , . . . , r . Stacking these relations column by column � c 11 · · · c n 1 . . � a 1 , . . . , a n � � b 1 , . . . , b r � . . = . . c 1 r · · · c nr 5
Rank and matrix factorizations Lemma. A matrix A ∈ R m × n of rank r admits a factorization of the form A = BC T , B ∈ R m × r , C ∈ R n × r . We say that A has low rank if rank ( A ) ≪ m , n . Illustration of low-rank factorization: BC T A #entries mn mr + nr ◮ Generically (and in most applications), A has full rank, that is, rank ( A ) = min { m , n } . ◮ Aim instead at approximating A by a low-rank matrix. 6
Questions addressed in lecture series What? Theoretical foundations of low-rank approximation. When? A priori and a posteriori estimates for low-rank approximation. Situations that allow for low-rank approximation techniques. Why? Applications in engineering, scientific computing, data analysis, ... where low-rank approximation plays a central role. How? State-of-the-art algorithms for performing and working with low-rank approximations. Will cover both, matrices and tensors. 7
Literature for Lecture 1 Golub/Van Loan’2013 Golub, Gene H.; Van Loan, Charles F . Matrix computations. Fourth edition. Johns Hopkins University Press, Baltimore, MD, 2013. Horn/Johnson’2013 Horn, Roger A.; Johnson, Charles R. Matrix analysis. Second edition. Cambridge University Press, 2013. + References on slides. 8
1. Fundamental tools ◮ SVD ◮ Relation to eigenvalues ◮ Norms ◮ Best low-rank approximation 9
The singular value decomposition Theorem (SVD). Let A ∈ R m × n with m ≥ n . Then there are orthogonal matrices U ∈ R m × m and V ∈ R n × n such that σ 1 ... A = U Σ V T , ∈ R m × n with Σ = σ n 0 and σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0. ◮ σ 1 , . . . , σ n are called singular values ◮ u 1 , . . . , u n are called left singular vectors ◮ v 1 , . . . , v n are called right singular vectors ◮ Av i = σ i u i , A T u i = σ i v i for i = 1 , . . . , n . ◮ Singular values are always uniquely defined by A . ◮ Singular values are never unique. If σ 1 > σ 2 > · · · σ n > 0 then unique up to u i ← ± u i , v i ← ± v i . 10
SVD: Sketch of proof Induction over n . n = 1 trivial. For general n , let v 1 solve max {� Av � 2 : � v � 2 = 1 } =: � A � 2 . Set σ 1 := � A � 2 and u 1 := Av 1 /σ 1 . 1 By definition, Av 1 = σ 1 u 1 . ∈ R m × m and � � After completion to orthogonal matrices U 1 = u 1 , U ⊥ � � ∈ R n × n : V 1 = v 1 , V ⊥ � u T � σ 1 u T w T � � 1 Av 1 1 AV ⊥ U T 1 AV 1 = = , U T U T ⊥ Av 1 ⊥ AV ⊥ 0 A 1 with w := V T ⊥ A T u 1 and A 1 = U T ⊥ AV ⊥ . � · � 2 invariant under orthogonal transformations � � σ 1 w T � �� � σ 1 = � A � 2 = � U T � � σ 2 1 + � w � 2 1 AV 1 � 2 = ≥ 2 . � � 0 A 1 � � 2 Hence, w = 0. Proof completed by applying induction to A 1 . 1 If σ 1 = 0, choose arbitrary u 1 . 11
Very basic properties of the SVD ◮ r = rank ( A ) is number of nonzero singular values of A . ◮ kernel ( A ) = span { v r + 1 , . . . , v n } ◮ range ( A ) = span { u 1 , . . . , u r } 12
SVD: Computation (for small dense matrices) Computation of SVD proceeds in two steps: 1. Reduction to bidiagonal form: By applying n Householder reflectors from left and n − 1 Householder reflectors from right, compute orthogonal matrices U 1 , V 1 such that ❅ ❅ � B 1 � ❅ ❅ U T , 1 AV 1 = B = = ❅ 0 0 that is, B 1 ∈ R n × n is an upper bidiagonal matrix. 2. Reduction to diagonal form: Use Divide&Conquer to compute orthogonal matrices U 2 , V 2 such that Σ = U T 2 B 1 V 2 is diagonal. Set U = U 1 U 2 and V = V 1 V 2 . Step 1 is usually the most expensive. Remarks on Step 1: ◮ If m is significantly larger than n , say, m ≥ 3 n / 2, first computing QR decomposition of A reduces cost. ◮ Most modern implementations reduce A successively via banded form to bidiagonal form. 2 2 Bischof, C. H.; Lang, B.; Sun, X. A framework for symmetric band reduction. ACM Trans. Math. Software 26 (2000), no. 4, 581–601. 13
SVD: Computation (for small dense matrices) In most applications, vectors u n + 1 , . . . , u m are not of interest. By omitting these vectors one obtains the following variant of the SVD. Theorem (Economy size SVD). Let A ∈ R m × n with m ≥ n . Then there is a matrix U ∈ R m × n with orthonormal columns and an orthonormal matrix V ∈ R n × n such that σ 1 ... A = U Σ V T , ∈ R n × n with Σ = σ n and σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0. Computed by M ATLAB ’s [U,S,V] = svd(A,’econ’) . Complexity: memory operations O ( mn 2 ) singular values only O ( mn ) O ( mn 2 ) economy size SVD O ( mn ) O ( m 2 + mn ) O ( m 2 n + mn 2 ) (full) SVD 14
SVD: Computation (for small dense matrices) Beware of roundoff error when interpreting singular value plots. Exmaple: semilogy(svd(hilb(100))) 10 0 10 -10 10 -20 0 20 40 60 80 100 ◮ Kink is caused by roundoff error and does not reflect true behavior of singular values. ◮ Exact singular values are known to decay exponentially. 3 ◮ Sometimes more accuracy possible. 4 . 3 Beckermann, B. The condition number of real Vandermonde, Krylov and positive definite Hankel matrices. Numer. Math. 85 (2000), no. 4, 553–577. 4 Drmaˇ c, Z.; Veseli´ c, K. New fast and accurate Jacobi SVD algorithm. I. SIAM J. Matrix Anal. Appl. 29 (2007), no. 4, 1322–1342 15
Singular/eigenvalue relations: symmetric matrices Symmetric A = A T ∈ R n × n admits spectral decomposition A = U diag ( λ 1 , λ 2 , . . . , λ n ) U T with orthogonal matrix U . After reordering may assume | λ 1 | ≥ | λ 2 | ≥ · · · ≥ | λ n | . Spectral decomposition can be turned into SVD A = U Σ V T by defining Σ = diag ( | λ 1 | , . . . , | λ n | ) , V = U diag ( sign ( λ 1 ) , . . . , sign ( λ n )) . Remark: This extends to the more general case of normal matrices (e.g., orthogonal or symmetric) via complex spectral or real Schur decompositions. 16
Singular/eigenvalue relations: general matrices Consider SVD A = U Σ V T of A ∈ R m × n with m ≥ n . We then have: 1. Spectral decomposition of Gramian A T A = V Σ T Σ V T = V diag ( σ 2 1 , . . . , σ 2 n ) V T � A T A has eigenvalues σ 2 1 , . . . , σ 2 n , right singular vectors of A are eigenvectors of A T A . 2. Spectral decomposition of Gramian AA T = U ΣΣ T U T = U diag ( σ 2 1 , . . . , σ 2 n , 0 , . . . , 0 ) U T � AA T has eigenvalues σ 2 1 , . . . , σ 2 n and, additionally, m − n zero eigenvalues, first n left singular vectors A are eigenvectors of AA T . 3. Decomposition of Golub-Kahan matrix � 0 � � 0 � T � � U � � U A 0 Σ 0 A = = A T Σ T 0 0 V 0 0 V � eigenvalues of A are ± σ j , j = 1 , . . . , n , and zero ( m − n times). 17
Norms: Spectral and Frobenius norm Given SVD A = U Σ V T , one defines: ◮ Spectral norm: � A � 2 = σ 1 . � ◮ Frobenius norm: � A � F = σ 2 1 + · · · + σ 2 n . Basic properties: ◮ � A � 2 = max {� Av � 2 : � v � 2 = 1 } (see proof of SVD). ◮ � · � 2 and � · � F are both (submultiplicative) matrix norms. ◮ � · � 2 and � · � F are both unitarily invariant, that is � QAZ � 2 = � A � 2 , � QAZ � F = � A � F for any orthogonal matrices Q , Z . ◮ � A � 2 ≤ � A � F ≤ � A � 2 / √ r ◮ � AB � F ≤ min {� A � 2 � B � F , � A � F � B � 2 } 18
Recommend
More recommend