Low Rank Approximation Lecture 10 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Time-dependent low-rank approximation Goal: Approximate given matrix A ( t ) at every time t ∈ [ 0 , T ] by rank- r matrix. Motivation for taking time-dependence into account: ◮ If σ r ( t ) > σ r + 1 ( t ) for all t ∈ [ 0 , T ] then T r ( A ( t )) inherits smoothness of A wrt t [Baumann/Helmke’2003; Mehrmann/Rath’1993]. ◮ Low-rank approximation at t i may yield valuable information for low-rank approximation at nearby t i + 1 . ◮ Allows us to work with tangent space (linear) instead of manifold (nonlinear). Much easier to impose additional structure. More general setting: Approximate solution of ODE ˙ A ( t ) = F ( A ( t )) , A ( 0 ) = A 0 . by trajectory X ( t ) in M r . Main application for low-rank tensors: Discretized time-dependent high-dimensional PDEs. 2

Dynamical low-rank approximation Recall definition of tangent vectors. If X : [ 0 , T ] → R m × n is a smooth curve in M r then ˙ X ( t ) ∈ T X ( t ) M r , ∀ t ∈ [ 0 , T ] . Dynamical low-rank approximation: Given A : [ 0 , T ] → R m × n , for every t construct X ( t ) ∈ M r satisfying ˙ � ˙ X ( t ) − ˙ X ( t ) ∈ T X ( t ) M r such that A ( t ) � = min ! ◮ Differential equation needs to be supplemented by initial value, say X ( 0 ) = T r ( A ( 0 )) , which is in M r unless A ( 0 ) has rank less than r . ◮ Hope: X stays close to A if A admits good rank- r approximation for all t . ◮ For efficiency, aim at finding equivalent system of ≈ dim M r differential equations. 3

Dynamical low-rank approximation Counterexample to hope:   1 0 0  , 10 − 5 e t − 1 A ( t ) = 0 0 t ∈ [ 0 , 10 ] .  10 − 5 e 1 − t 0 0 For r = 2, dynamical low-rank approximation yields  1 0 0   . 0 0 0 X ( t ) =  10 − 5 e 1 − t 0 0 � A ( T ) − X ( T ) � F = 10 4 � A ( T ) − T r ( A ( T )) � F = 10 − 14 . but Dynamical low-rank approximation is “blind” to evolution of 10 − 5 e t − 1 . Can only hope for good approximation on short time intervals and if σ r ( t ) > σ r + 1 ( t ) . 4

Dynamical low-rank approximation ˙ � ˙ X ( t ) − ˙ X ( t ) ∈ T X ( t ) M r such that A ( t ) � = min ! is equivalent to differential equation X ( t ) = P X ( t ) ( ˙ ˙ A ( t )) , (1) where P X ( t ) is the orthogonal projection onto T X ( t ) M r . Will now omit dependence on time. Given X = USV T with S ∈ R r × r (not necessarily diagonal), U ∈ St ( m , r ) and V ∈ St ( n , r ) , we have P X ( ˙ P U ˙ U ˙ AP V + P U ˙ AP V + P ⊥ AP ⊥ A ) = V SV T + ˙ USV T + US ˙ U ˙ V T , = where U T ˙ ˙ S = AV ˙ U ˙ P ⊥ AVS − 1 U = ˙ V ˙ P ⊥ A T US − T . V = 5

Dynamical low-rank approximation Dynamical low-rank approximation is equivalent to system of differential equations U T ˙ ˙ S = AV ˙ U ˙ P ⊥ AVS − 1 U = ˙ V ˙ P ⊥ A T US − T V = with initial values S ( 0 ) = S 0 ∈ R r × r , U ( 0 ) = U 0 ∈ St ( m , r ) , V ( 0 ) = V 0 ∈ St ( n , r ) . Remarks: ◮ ˙ U ∈ T U St ( m , r ) and thus U stays in St ( m , r ) ; ˙ V ∈ T V St ( n , r ) and thus V stays in St ( n , r ) . This can be preserved using geometric integration [Hairer/Lubich/Wanner’2006] or combining standard integrators with retractions. ◮ Step size control should aim at controlling error for USV T and not for individual factors ◮ Presence of S − 1 makes differential equations very stiff for ill-conditioned S ( σ r close to zero); results in small step sizes of explicit integrators. 6

Dynamical low-rank approximation Error bound from Theorem 5.1 in [Koch/Lubich’2007]. Theorem Suppose that for t ∈ [ 0 , T ] we have ◮ σ r ( A ( t )) > σ r + 1 ( A ( t )) ; ◮ � A ( t ) − T r ( A ( t )) � F ≤ ρ/ 16 with ρ = min t ∈ [ 0 , T ] σ r ( A ( t )) . Then, with X ( 0 ) = T r ( A ( 0 )) , we have � t � X ( t ) − T r ( A ( t )) � F ≤ 2 β e β t � A ( s ) − T r ( A ( s )) � F d s 0 where β = 8 µ/ρ . 7

Dynamical low-rank approximation A = F ( A ) with F : R m × n → R m × n . Extension to dynamical system ˙ Dynamical low-rank approximation: Construct X ( t ) ∈ M r satisfying ˙ � ˙ X ( t ) ∈ T X ( t ) M r such that X ( t ) − F ( X ( t )) � = min ! Equivalent to dynamical system ˙ X ( t ) = P X ( t ) ( F ( X ( t ))) . Equivalent to ˙ U T F ( X ( t )) V S = ˙ P ⊥ U F ( X ( t )) VS − 1 U = ˙ P ⊥ V F ( X ( t )) T US − T V = EFY. Consider the linear matrix differential equation ˙ A = LA + AR for L ( t ) ∈ R m × m and R ( t ) ∈ R n × n . Show that A ( t ) ∈ M r for all t > 0 if A ( 0 ) ∈ M r . Write down the differential equations for U , S , V . EFY. Consider the nonlinear matrix differential equation ˙ A = F ( A ) := LA + AR + A ◦ A , where ◦ denotes the elementwise product. Develop an efficient method for evaluating F ( A ( t )) for A ( t ) ∈ M r when r ≪ m , n . 8

Dynamical low-rank approximation For error analysis assume for Y ( t ) := T r ( A ( t )) that ◮ � F ( Y ( t )) � F ≤ µ , � F ( X ( t )) � F ≤ µ for t ∈ [ 0 , T ] ; ◮ � F ( Z 1 ) − F ( Z 2 ) , Z 1 − Z 2 � ≤ λ � Z 1 − Z 2 � 2 F for all Z 1 , Z 2 ∈ M r and some fixed λ ∈ R ; ◮ � F ( Y ( t )) − F ( A ( t )) � F ≤ L � Y ( t ) − A ( t ) � F for t ∈ [ 0 , T ] . Theorem 6.1 in [Koch/Lubich’2007]: Theorem Under assumptions above, suppose that for t ∈ [ 0 , T ] we have ◮ σ r ( A ( t )) > σ r + 1 ( A ( t )) ; ◮ � A ( t ) − T r ( A ( t )) � F ≤ ρ/ 16 with ρ = min t ∈ [ 0 , T ] σ r ( A ( t )) . Then, with X ( 0 ) = T r ( A ( 0 )) , we have � t � X ( t ) − T r ( A ( t )) � F ≤ ( 2 β + L ) e ( 2 β + λ ) t � A ( s ) − T r ( A ( s )) � F d s 0 9

Dynamical low-rank approximation Numerical integrators face severe problems as σ r → 0. Unfortunately, σ r ≈ 0 is a very likely situation, as problems of interest typically feature quick singular value decay. Regularization ( = increasing small singular values of S by ǫ > 0) introduces additional errors whose effect is poorly understood. Fundamental underlying problem: U , V ill-conditioned in the presence of small σ r . Idea by [Lubich/Oseledets’2014]: Splitting integrator that is insensitive to σ r → 0. 10

Projector-splitting integrator Recall that X ( t ) = P X ( t ) ( ˙ ˙ A ( t )) , with P U ZP V + P ⊥ U ZP V + P ⊥ P X ( Z ) = V ZP U ZVV T − UU T ZVV T + UU T Z = = ZP range ( X T ) − P range ( X ) ZP range ( X T ) + P range ( X ) Z . One step of Lie-Trotter splitting integrator from t 0 to t 1 = t 0 + h applied to this decomposition: 1. Solve ˙ X I = ˙ AP range ( X T I ) on [ t 0 , t 1 ] with i.v. X I ( t 0 ) = X 0 ∈ M r . 2. Solve ˙ X II = − P range ( X II ) ˙ II ) on [ t 0 , t 1 ] with i.v. AP range ( X T X II ( t 0 ) = X I ( t 1 ) . 3. Solve ˙ X III = P range ( X III ) ˙ A on [ t 0 , t 1 ] with i.v. X III ( t 0 ) = X II ( t 1 ) . Approximation returned: X 1 := X III ( t 1 ) . Standard theory [Hairer/Lubich/Wanner’2006]: Lie-Trotter splitting integrator has convergence order one. 11

Projector-splitting integrator Consider first step: Rhs ˙ AP range ( X T I ) ∈ T X I M r and hence X I ∈ M r for t ∈ [ t 0 , t 1 ] . � factorization X I = U I S I V T I . Differentiation � 1 ! ∂ t ( U I S I ) V T I + U I S I ˙ V T I = ˙ = ˙ AV I V T X I I . Satisfied if 1 ∂ t ( U I S I ) = ˙ ˙ AV I , V I = 0 . In particular, primitive of ˙ AV I is AV I . In turn, this equation is solved by � � U I ( t ) S I ( t ) = U I ( t 0 ) S I ( t 0 ) + A ( t ) − A ( t 0 ) V I ( t 0 ) . Similar considerations for second and third step. 12

Projector-splitting integrator Solution of three split equations given by � � 1. U I ( t ) S I ( t ) = U I ( t 0 ) S I ( t 0 ) + A ( t ) − A ( t 0 ) V I ( t 0 ) , � T V II ( t 0 ) , 2. S II ( t ) = S II ( t 0 ) − U II ( t 0 ) T � A ( t ) − A ( t 0 ) 3. V III ( t ) S III ( t ) T = V III ( t 0 ) S III ( t 0 ) T + � T U III ( t 0 ) , � A ( t ) − A ( t 0 ) Unassigned Quantities ( V I , U II , V II , U III ) remain constant. Practical algorithm: Given X 0 = U 0 S 0 V T 0 and with ∆ A := A ( t 1 ) − A ( t 0 ) , compute 1. QR factorization K I = U I S I for K I = U 0 S 0 + ∆ A V 0 . 2. S II = S I − U T I ∆ A V 0 3. QR factorization L III = V III S T III for L III = V 0 S T II + ∆ T A U I . Returns U 1 S 1 V T 1 = U I S III V T III 13

Projector-splitting integrator Remarks: ◮ KSL splitting is exact if A ( t ) has rank at most r (Theorem 4.1 by Lubich/Oseledets) � robustness in presence of small singular value σ r [Kieri/Lubich/Wallach’2016]. ◮ Symmetrization (KSLSK) leads to second order. ◮ Splitting extends to ˙ A = F ( A ) by setting △ A := hF ( Y 0 ) (but exactness result does not hold). 14

Further literature ◮ Extension to Tucker [Koch/Lubich’2010, Lubich/Vandereycken/Wallach’2018], tensor train [Lubich/Oseledets/Vandereycken’2015] ◮ Application to 3D nonlinear PDEs [Nonnenmacher/Lubich’2009], matrix differential equations [Mena et al.’2018], Vlasov-Poisson equation [Einkemmer/Lubich’2018], time-dependent PDEs with uncertainties [Musharbash/Nobile/Zhou’2014], quantum chemistry [Kloss/Burghardt/Lubich’2017] and physics [Haegeman et al.’2016]. ◮ [Kieri/Vandereycken’2017]: Conceptually simpler approach by combining standard integrators with low-rank truncation. 15

Low Rank Approximation Lecture 10 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Time-dependent low-rank approximation Goal: Approximate given matrix A ( t ) at every time t

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC

Ischemic Heart Failure Trial Commentary : Alec Vahanian Bichat Hospital, Paris, France

Slides by Mr. Zindman NEW YORK STATE STANDARD 7.5 THE CONSTITUTION IN PRACTICE: The United States

Electron Cloud studies in J- PARC MR and Fermilab RR K. Ohmi (KEK) Mar 19-20, 2019 US-Japan

Structured Databases of Named Entities from Bayesian Nonparametrics Dr. Jacob Eisenstein

Frontiers of Materials Research: A Decadal Survey* *This is the fourth materials research decadal

Learning MR-Sort rules with coalitional veto Olivier Sobrie 1,2 Vincent Mousseau 1 Marc Pirlot 2 1

Learning From Data Lecture 27 Learning Aides Input Preprocessing Dimensionality Reduction and

Reminder: Next week you have 1 Monday Tuesday Wednesday Thursday Friday Session 9