Low Rank Approximation Lecture 8 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Linear operators in TT decomposition Consider linear operator L : R n 1 ×···× n d → R n 1 ×···× n d and corresponding matrix representation L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) . L is TT operator if it takes the form L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) = R d − 1 R 1 � � · · · A 1 ( 1 , i 1 , j 1 , k 1 ) A 2 ( k 1 , i 2 , j 2 , k 2 ) · · · A d ( k d − 1 , i d , j d , 1 ) k 1 = 1 k d − 1 = 1 with R µ − 1 × n µ × n µ × R µ tensors A µ . Operator TT rank of L = Smallest possible values of ( R 1 , . . . , R µ − 1 ) . Matrix Product Operator (MPO): L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) = A 1 ( i 1 , j 1 ) A 2 ( i 2 , j 2 ) · · · A d ( i d , j d ) , with A µ ( i µ , j µ ) = A µ (: , i µ , j µ , :) . 2

Linear operators in TT decomposition Alternative view: Reinterpret matrix representation of L as tensor ten ( L ) ∈ R n 2 1 ×···× n 2 d , by merging indices ( i µ , j µ ) applying TT format, and separating indices ( i µ , j µ ) . In practice: Perfect shuffle permutation of modes. Tensor network diagram for operator TT decomposition: EFY. Show that the identity matrix has operator TT ranks 1. 3

Linear operators in TT decomposition Example: Discrete Laplace-like operator takes the form L : X �→ L 1 ◦ 1 X + · · · + L d ◦ d X for matrices L µ ∈ R n µ × n µ . Matrix representation L = I n d ⊗ · · · ⊗ I n 2 ⊗ L 1 + · · · + L d ⊗ I n d − 1 ⊗ · · · ⊗ I n 1 TT representation with cores � � A 1 ( i 1 , j 1 ) = L 1 ( i 1 , j 1 ) I n 1 ( i 1 , j 1 ) � � I n µ ( i µ , j µ ) 0 A µ ( i µ , j µ ) = L µ ( i µ , j µ ) I n µ ( i µ , j µ ) � I n d ( i d , j d ) � A d ( i d , j d ) = L d ( i d , j d ) 4

Multiplication with linear operators in TT decomposition x = L x or, equivalently, ˜ Consider matrix-vector product ˜ X = L ( X ) . Assume that L and U are in TT decomposition � ˜ X ( i 1 , . . . , i d ) � = L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) X ( j 1 , . . . , j d ) j 1 ,..., j d � = A 1 ( i 1 , j 1 ) · · · A d ( i d , j d ) U 1 ( j 1 ) · · · U d ( j d ) j 1 ,..., j d � = ( A 1 ( i 1 , j 1 ) ⊗ U 1 ( j 1 )) · · · ( A d ( i d , j d ) ⊗ U d ( j d )) j 1 ,..., j d � � � � � � = A 1 ( i 1 , j 1 ) ⊗ U 1 ( j 1 ) · · · A d ( i d , j d ) ⊗ U d ( j d ) j 1 j d U 1 ( i 1 ) · · · ˜ ˜ =: U d ( i d ) , that is, TT ranks of ˜ X are bounded by TT ranks of X multiplied with TT operator ranks of L . 5

Multiplication with linear operators in TT decomposition Illustration by tensor network diagrams: ◮ Carrying out contractions requires O ( dnr 4 ) operations for tensors of order d . 6

TT decomposition – Summary of operations Easy: ◮ (partial) contractions ◮ multiplication with TT operators ◮ recompression/truncation Medium: ◮ entrywise products Hard: ◮ almost everything else Software: ◮ TT toolox (Matlab, Python), TTeMPS (Matlab), . . . 7

Alternating least-squares / linear scheme General setting: Solve optimization problem min X f ( X ) , where X is a (large) matrix or tensor and f is “simple” (e.g., convex). Constrain X to M r , set of rank- r matrices or tensors and aim at solving X ∈M r f ( X ) , min Set X = i ( U 1 , U 2 , . . . , U d ) . (e.g., X = U 1 U T 2 ). Low-rank formats are multilinear � there is hope that optimizing for each component is simple: min U µ f ( i ( U 1 , U 2 , . . . , U d )) . 8

Alternating least-squares / linear scheme Set f ( U 1 , . . . , U d ) := f ( i ( U 1 , . . . , U d )) . ALS: 1: while not converged do U 1 ← arg min U 1 f ( i ( U 1 , U 2 , . . . , U d )) 2: U 2 ← arg min U 1 f ( i ( U 1 , U 2 , . . . , U d )) 3: . . . 4: U d ← arg min U 1 f ( i ( U 1 , U 2 , . . . , U d )) 5: 6: end while Examples: ◮ ALS for fitting CP decomposition ◮ Subspace iteration. Closely related: Block Gauss-Seidel, Block Coordinate Descent. Difficulties: ◮ Representation ( U 1 , U 2 , . . . , U d ) often non-unique, parameters may become unbounded. ◮ M r not closed ◮ Convergence (analysis) 9

Subspace iteration and ALS Given A ∈ R m × n , consider computation of best rank- r approximation: f ( U , V ) := � A − UV T � 2 U ∈ R m × r , V ∈ R n × r f ( U , V ) , min F ◮ Representation UV T is unique for each U , V individually if U , V have rank r . ◮ f is convex wrt U and V individually. f ( U + H , V ) − f ( U , V ) + O ( � H � 2 �∇ U f ( U , V ) , H � = 2 ) − 2 � AV − UV T V , H � . = Hence, 0 = ∇ U f ( U , V ) = − 2 ( AV − UV T V ) U = AV ( V T V ) − 1 . ⇔ For stability it is advisable to choose V such that it has orthonormal columns. 10

Subspace iteration and ALS ALS for low-rank matrix approximation: 1: while not converged do Compute economy size QR factorization: V = QR and update 2: V ← Q . U ← AV 3: Compute economy size QR factorization: U = QR and update 4: U ← Q . V ← A T U 5: 6: end while Returns an approximation A ≈ UV T . This is the subspace iteration from Lecture 1! EFY. Develop an ALS method for solving the weighted low-rank approximation problem � WL ( A − UVT ) WR � F min U , V with square and invertible matrices WL , WR . 11

Linear matrix equations For linear operator L : R m × n → R m × n , consider linear system C , X ∈ R m × n . L ( X ) = C , Examples: 1 ◮ Sylvester matrix equation: A ∈ R m × m , B ∈ R n × n , C , X ∈ R m × n . AX + XB = C , Applications: Discretized 2D Laplace on rectangle, stability analysis, optimal control, model reduction of linear control systems. Special case Lyapunov equations: m = n , A = B T , C symmetric (and often negative semi-definite) ◮ Stochastic Galerkin methods in uncertainty quantification. ◮ Stochastic control. 1 See [V. Simoncini, Computational methods for linear matrix equations , SIAM Rev., 58 (2016), pp. 377–441] for details and references. 12

Linear matrix equations Using the matrix M L representing L in canonical bases, we can rewrite L ( X ) = B as linear system M L ( vec ( X )) = vec ( C ) . Assumption: M L has low Kronecker rank: M L = B 1 ⊗ A 1 + · · · + B R ⊗ A R , R ≪ m , n . Equivalently, L ( X ) = A 1 XB T 1 + · · · + A R XB T R EFY. Develop a variant of ACA (from Lecture 3) that aims at approximating a given sparse matrix A by a matrix of low Kronecker rank for given m , n . EFY. Show that if m = n , M L is symmetric and has Kronecker rank R , one can find symmetric matrices A 1 , . . . , AR , B 1 , . . . , BR such that L ( X ) = A 1 XB 1 + · · · + AR XBR . Is it always possible to choose all Ak , Bk positive semi-definite if M L is positive definite? 13

Linear matrix equations Two ways of turning L ( X ) = C into optimization problem: 1. If M L is symmetric positive definite: 1 min 2 �L ( X ) , X � − � X , B � . X 2. General L : X �L ( X ) − B � 2 min F Will focus on spd M L in the following. 14

Linear matrix equations Low-rank approximation of L ( X ) = B obtained by solving f ( U , V ) = 1 2 �L ( UV T ) , UV T � − � UV T , C � . min U , V f ( U , V ) for Let L have Kronecker rank R . Then R R � � �L ( UV T ) , UV T � � A k UV T B k , UV T � = � A k UV T B k V , U � . = k = 1 k = 1 This shows that arg min U f ( U , V ) is solution of linear matrix equation A 1 U ( V T B 1 V ) + · · · + A R U ( V T B R V ) = CV . Show that this linear matrix equation always has a unique solution under the assumption that L is symmetric positive definite. EFY. ◮ For R = 2, can be reduced to R linear systems of size n × n . ◮ For R > 2, need to solve Rn × Rn system. 15

Linear matrix equations ALS for linear matrix equations: 1: while not converged do Compute economy size QR factorization: V = QR and update 2: V ← Q . Solve A 1 U ( V T B 1 V ) + · · · + A R U ( V T B R V ) = CV for U . 3: Compute economy size QR factorization: U = QR and update 4: U ← Q . Solve ( U T A 1 U ) V T B 1 + · · · + ( U T A R U ) V T B R = U T C for V . 5: 6: end while Returns an approximation X ≈ UV T . For R = 2, there are better alternatives: ADI, Krylov subspace methods, ... [Simoncini’2016]. 16

2D eigenvalue problem ◮ −△ u ( x ) + V ( x ) u = λ u ( x ) in Ω = [ 0 , 1 ] × [ 0 , 1 ] with Dirichlet b.c. and Henon-Heiles potential V ◮ Regular discretization ◮ Reshaped ground state into matrix Ground state Singular values 0 10 −5 10 −10 10 −15 10 −20 10 0 100 200 300 Excellent rank-10 approximation possible 17

Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 × n 2 matrix A . Then � x , A x � λ min ( A ) = min � x , x � . x � = 0 We now... ◮ reshape vector x into n × n matrix X ; ◮ reinterpret A x as linear operator A : X �→ A ( X ) . 18

Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 × n 2 matrix A . Then � X , A ( X ) � λ min ( A ) = min � X , X � X � = 0 with matrix inner product �· , ·� . We now... ◮ restrict X to low-rank matrices. 19

Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 × n 2 matrix A . Then � X , A ( X ) � λ min ( A ) ≈ min . � X , X � X = UV T � = 0 ◮ Approximation error governed by low-rank approximability of X . ◮ Solved by Riemannian optimization techniques or ALS. 20

Low Rank Approximation Lecture 8 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Linear operators in TT decomposition Consider linear operator L : R n 1 n d R n 1

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Use R! for estimating forest parameters based on Airborne Laser Scanner Data Johannes Breidenbach

of Glyphosate-Resistant Palmer Amaranth in the Southeast Herbicide Resistance in US by Mode of

An Exploration of Optimization Algorithms for High Performance Tensor Completion Shaden Smith 1

Goals Enable consumers & enterprises to move to a secure, password free world inclusive

Online Recommenda-on with Implicit Feedback Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua

Improvement of acceleration of the ALS algorithm using the vector algorithm * Masahiro Kuroda

INFORMATION ALEXANDRIA LOCAL 2141 COMPENSATION How to Read Normalized = salaries adjusted

Lecture 16 Chapters 12&14 Risk and Odds; Reading the Economic News Two-Way Tables:

Low Rank Approximation Lecture 8 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Linear operators in TT decomposition Consider linear operator L : R n 1 n d R n 1

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Use R! for estimating forest parameters based on Airborne Laser Scanner Data Johannes Breidenbach

of Glyphosate-Resistant Palmer Amaranth in the Southeast Herbicide Resistance in US by Mode of

An Exploration of Optimization Algorithms for High Performance Tensor Completion Shaden Smith 1

Goals Enable consumers &amp; enterprises to move to a secure, password free world inclusive

Online Recommenda-on with Implicit Feedback Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua

Improvement of acceleration of the ALS algorithm using the vector algorithm * Masahiro Kuroda

INFORMATION ALEXANDRIA LOCAL 2141 COMPENSATION How to Read Normalized = salaries adjusted

Lecture 16 Chapters 12&amp;14 Risk and Odds; Reading the Economic News Two-Way Tables:

Goals Enable consumers & enterprises to move to a secure, password free world inclusive

Lecture 16 Chapters 12&14 Risk and Odds; Reading the Economic News Two-Way Tables: