Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 - PowerPoint PPT Presentation

Low Rank Approximation by SVD Computing Low Rank Approximations Randomness and Approximation Hierarchical Low-Rank Structure Parallel Numerical Algorithms Chapter 6 – Matrix Models Section 6.2 – Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar Solomonik Parallel Numerical Algorithms 1 / 37

Low Rank Approximation by SVD Computing Low Rank Approximations Randomness and Approximation Hierarchical Low-Rank Structure Outline Low Rank Approximation by SVD 1 Truncated SVD Fast Algorithms with Truncated SVD Computing Low Rank Approximations 2 Direct Computation Indirect Computation Randomness and Approximation 3 Randomized Approximation Basics Structured Randomized Factorization Hierarchical Low-Rank Structure 4 HSS Matrix–Vector Multiplication Parallel HSS Matrix–Vector Multiplication Edgar Solomonik Parallel Numerical Algorithms 2 / 37

Low Rank Approximation by SVD Truncated SVD Computing Low Rank Approximations Randomness and Approximation Fast Algorithms with Truncated SVD Hierarchical Low-Rank Structure Rank- k Singular Value Decomposition (SVD) For any matrix A ∈ R m × n of rank k there exists a factorization A = UDV T U ∈ R m × k is a matrix of orthonormal left singular vectors D ∈ R k × k is a nonnegative diagonal matrix of singular values in decreasing order σ 1 ≥ · · · ≥ σ k V ∈ R n × k is a matrix of orthonormal right singular vectors Edgar Solomonik Parallel Numerical Algorithms 3 / 37

Low Rank Approximation by SVD Truncated SVD Computing Low Rank Approximations Randomness and Approximation Fast Algorithms with Truncated SVD Hierarchical Low-Rank Structure Truncated SVD Given A ∈ R m × n seek its best k < rank( A ) approximation B = argmin ( || A − B || 2 ) B ∈ R m × n , rank( B ) ≤ k Eckart-Young theorem: given SVD � � D 1 � � � T ⇒ B = U 1 D 1 V T � A = U 1 U 2 V 1 V 2 1 D 2 where D 1 is k × k . U 1 D 1 V T 1 is the rank- k truncated SVD of A and || A − U 1 D 1 V T 1 || 2 = B ∈ R m × n , rank( B ) ≤ k ( || A − B || 2 ) = σ k +1 min Edgar Solomonik Parallel Numerical Algorithms 4 / 37

Low Rank Approximation by SVD Truncated SVD Computing Low Rank Approximations Randomness and Approximation Fast Algorithms with Truncated SVD Hierarchical Low-Rank Structure Computational Cost Given a rank k truncated SVD A ≈ UDV T of A ∈ R m × n with m ≥ n Performing approximately y = Ax requires O ( mk ) work y ≈ U ( D ( V T x )) Solving Ax = b requires O ( mk ) work via approximation x ≈ V D − 1 U T b Edgar Solomonik Parallel Numerical Algorithms 5 / 37

Low Rank Approximation by SVD Direct Computation Computing Low Rank Approximations Randomness and Approximation Indirect Computation Hierarchical Low-Rank Structure Computing the Truncated SVD Reduction to upper-Hessenberg form via two-sided orthogonal updates can compute full SVD Given full SVD can obtain truncated SVD by keeping only largest singular value/vector pairs Given set of transformations Q 1 , . . . , Q s so that U = Q 1 · · · Q s , can obtain leading k columns of U by computing � � �� I U 1 = Q 1 · · · Q s 0 This method requires O ( mn 2 ) work for the computation of singular values and O ( mnk ) for k singular vectors Edgar Solomonik Parallel Numerical Algorithms 6 / 37

Low Rank Approximation by SVD Direct Computation Computing Low Rank Approximations Randomness and Approximation Indirect Computation Hierarchical Low-Rank Structure Computing the Truncated SVD by Krylov Subspace Methods Seek k ≪ m, n leading right singular vectors of A Find a basis for Krylov subspace of B = A T A Rather than computing B , compute products Bx = A T ( Ax ) For instance, do k ′ ≥ k + O (1) iterations of Lanczos and compute k Ritz vectors to estimate singular vectors V Left singular vectors can be obtained via AV = UD This method requires O ( mnk ) work for k singular vectors However, Θ( k ) sparse-matrix-vector multiplications are needed (high latency and low flop/byte ratio) Edgar Solomonik Parallel Numerical Algorithms 7 / 37

Low Rank Approximation by SVD Direct Computation Computing Low Rank Approximations Randomness and Approximation Indirect Computation Hierarchical Low-Rank Structure Generic Low-Rank Factorizations A matrix A ∈ R m × n is rank k , if for some X ∈ R m × k , Y ∈ R n × k with k ≤ min( m, n ) , A = XY T If A = XY T (exact low rank factorization), we can obtain reduced SVD A = UDV T via [ U 1 , R ] = QR ( X ) 1 [ U 2 , D , V ] = SVD ( RY T ) 2 U = U 1 U 2 3 with cost O ( mk 2 ) using an SVD of a k × k rather than m × n matrix If instead || A − XY T || 2 ≤ ε then || A − UDV T || 2 ≤ ε So we can obtain a truncated SVD given an optimal generic low-rank approximation Edgar Solomonik Parallel Numerical Algorithms 8 / 37

Low Rank Approximation by SVD Direct Computation Computing Low Rank Approximations Randomness and Approximation Indirect Computation Hierarchical Low-Rank Structure Rank-Revealing QR If A is of rank k and its first k columns are linearly independent   R 11 R 12 A = Q 0 0   0 0 where R 11 is upper-triangular and k × k and Q = Y T Y T with n × k matrix Y For arbitrary A we need column ordering permutation P A = QRP QR with column pivoting (due to Gene Golub) is an effective method for this pivot so that the leading column has largest 2-norm method can break in the presence of roundoff error (see Kahan matrix), but is very robust in practice Edgar Solomonik Parallel Numerical Algorithms 9 / 37

Low Rank Approximation by SVD Direct Computation Computing Low Rank Approximations Randomness and Approximation Indirect Computation Hierarchical Low-Rank Structure Low Rank Factorization by QR with Column Pivoting QR with column pivoting can be used to either determine the (numerical) rank of A compute a low-rank approximation with a bounded error performs only O ( mnk ) rather than O ( mn 2 ) work for a full QR or SVD Edgar Solomonik Parallel Numerical Algorithms 10 / 37

Low Rank Approximation by SVD Direct Computation Computing Low Rank Approximations Randomness and Approximation Indirect Computation Hierarchical Low-Rank Structure Parallel QR with Column Pivoting In distributed-memory, column pivoting poses further challenges Need at least one message to decide on each pivot column, which leads to Ω( k ) synchronizations Existing work tries to pivot many columns at a time by finding subsets of them that are sufficiently linearly independent Randomized approaches provide alternatives and flexibility Edgar Solomonik Parallel Numerical Algorithms 11 / 37

Low Rank Approximation by SVD Randomized Approximation Basics Computing Low Rank Approximations Randomness and Approximation Structured Randomized Factorization Hierarchical Low-Rank Structure Randomization Basics Intuition: consider a random vector w of dimension n , all of the following holds with high probability in exact arithmetic Given any basis Q for the n dimensional space, random w is not orthogonal to any row of Q T Let A = UDV T where V T ∈ R n × k Vector w is at random angle with respect to any row of V T , so z = V T w is a random vector Aw = UDz is random linear combination of cols of UD Given k random vectors, i.e., random matrix W ∈ R n × k Columns of B = AW gives k random linear combinations of columns of in UD B has the same span as U ! Edgar Solomonik Parallel Numerical Algorithms 12 / 37

Low Rank Approximation by SVD Randomized Approximation Basics Computing Low Rank Approximations Randomness and Approximation Structured Randomized Factorization Hierarchical Low-Rank Structure Using the Basis to Compute a Factorization If B has the same span as the range of A [ Q , R ] = QR ( B ) gives orthogonal basis Q for B = AW QQ T A = QQ T UDV T = ( QQ T U ) DV T , now Q T U is orthogonal and so QQ T U is a basis for the range of A so compute H = Q T A , H ∈ R k × n and compute [ U 1 , D , V ] = SVD ( H ) then compute U = QU 1 and we have a rank k truncated SVD of A A = UDV T Edgar Solomonik Parallel Numerical Algorithms 13 / 37

Low Rank Approximation by SVD Randomized Approximation Basics Computing Low Rank Approximations Randomness and Approximation Structured Randomized Factorization Hierarchical Low-Rank Structure Cost of the Randomized Method Matrix multiplications e.g. AW , all require O ( mnk ) operations QR and SVD require O (( m + n ) k 2 ) operations If k ≪ min( m, n ) the bulk of the computation here is within matrix multiplication, which can be done with fewer synchronizations and higher efficiency than QR with column pivoting or Arnoldi Edgar Solomonik Parallel Numerical Algorithms 14 / 37

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 - PowerPoint PPT Presentation

Low Rank Approximation by SVD Computing Low Rank Approximations Randomness and Approximation Hierarchical Low-Rank Structure Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng, Abhinav Verma,

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural

Model order reduction for PDE constrained optimization in vibrations Karl Meerbergen (Joint work

Slide Reduction, RevisitedFilling the Gaps in Lattice SVP Approximation Jianwei Li ISG, RHUL,

+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast

Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger, Ulf

On Reducing Maximum Independent Set to Minimum Satis fiabili ty Ale x e y Ig n a t ie v , A

STEP Reduce Seth Hillbrand KiCad Services Corp. 1 / 13 Motivation STEP fjles are the

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 - PowerPoint PPT Presentation

Low Rank Approximation by SVD Computing Low Rank Approximations Randomness and Approximation Hierarchical Low-Rank Structure Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng, Abhinav Verma,

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural

Model order reduction for PDE constrained optimization in vibrations Karl Meerbergen (Joint work

Slide Reduction, RevisitedFilling the Gaps in Lattice SVP Approximation Jianwei Li ISG, RHUL,

+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast

Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger, Ulf

On Reducing Maximum Independent Set to Minimum Satis fiabili ty Ale x e y Ig n a t ie v , A

STEP Reduce Seth Hillbrand KiCad Services Corp. 1 / 13 Motivation STEP fjles are the

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions