Low Rank Approximation Lecture 4 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Sampling based approximation Aim: Obtain rank- r approximation of m × n matrix A from selected entries of A . Two different situations: ◮ Unstructured sampling: Let Ω ⊂ { 1 , . . . , m } × { 1 , . . . , n } . Solve � min � A − BC T � Ω , � M � 2 m 2 Ω = ij . ( i , j ) ∈ Ω Matrix completion problem solved by general optimization techniques (ALS, Riemannian optimization, convex relaxation). Will discuss later. ◮ Column/row sampling: Focus of this lecture. 2

Row selection from orthonormal basis Task. Given orthonormal basis U ∈ R n × r find a “good” r × r submatrix of U . Classical problem already considered by Knuth. 1 Quantification of “good”: Smallest singular value not too small. Some notation: ◮ Given an m × n matrix A and index sets { i 1 , . . . , i k } , 1 ≤ i 1 < i 2 < · · · i k ≤ m , I = { j 1 , . . . , j ℓ } , 1 ≤ j 1 < j 2 < · · · j ℓ ≤ n , J = we let  · · ·  a i 1 , j 1 a i 1 , j n . .  ∈ R k × ℓ . . . A ( I , J ) =   . .  a i m , j 1 · · · a i m , j n The full index set is denoted by : , e.g., A ( I , :) . ◮ | det A | denotes the volume of a square matrix A . 1 Knuth, Donald E. Semioptimal bases for linear dependencies. Linear and Multilinear Algebra 17 (1985), no. 1, 1–4. 3

Row selection from orthonormal basis Lemma (Maximal volume yields good submatrix) Let index set I, # I = r, be chosen such that | det ( U ( I , :)) | is maximized among all r × r submatrices. Then 1 � σ min ( U ( I , :)) ≤ r ( n − r ) + 1 Proof. 2 W.l.o.g. I = { 1 , . . . , r } . Consider � � I r U = UU ( I , :) − 1 = ˜ . B Because of det ˜ U ( J , :) = det U ( J , :) / det U ( I , :) for any J , submatrix # J = r , ˜ U ( I , :) has maximal volume among all r × r submatrices of ˜ U . 2 Following Lemma 2.1 in [Goreinov, S. A.; Tyrtyshnikov, E. E.; Zamarashkin, N. L. A theory of pseudoskeleton approximations. Linear Algebra Appl. 261 (1997), 1–21]. 4

Maximality of ˜ U ( I , :) implies max | b ij | ≤ 1. Argument: If there was b ij with | b ij | > 1 then interchanging rows r + i and j of ˜ U would increase volume of ˜ U ( I , :) . We have � � � B � 2 ≤ � B � F ≤ ( n − r ) r max | b ij | ≤ 1 + ( n − r ) r . This yields the result: � � U ( I , :) − 1 � 2 = � UU ( I , :) − 1 � 2 = � 1 + � B � 2 2 ≤ 1 + ( n − r ) r . 5

Greedy row selection from orthonormal basis Finding submatrix of maximal volume is NP hard. 3 Greedy algorithm (column-by-column): 4 ◮ First step is easy: Choose i such that | u i 1 | is maximal. ◮ Now, assume that k < r steps have been performed and the first k columns have been processed. Task: Choose optimal index in column k + 1. There is a one-to-one connection between greedy row selection and Gaussian elimination with column pivoting! 3 Civril, A., Magdon-Ismail, M.: On selecting a maximum volume sub-matrix of a matrix and related problems. Theoret. Comput. Sci. 410(47-49), 4801–4811 (2009) 4 Reinvented multiple times in the literature. 6

Greedy row selection from orthonormal basis Gaussian elimination without pivoting applied to U ∈ R n × r : for k = 1 , . . . , r do 1 L (: , k ) ← u kk U (: , k ) , R ( k , :) ← U ( k , :) U ← U − L (: , k ) R ( k , :) end for Let ˜ U denote the updated matrix U obtained after k steps. Then: ◮ ˜ U = U − LR with � L 11 � ∈ R n × k , ∈ R k × r � R 11 � L = R = R 12 L 21 L 11 unit lower triangular, R 11 upper triangular. ◮ ˜ U is zero in first k rows and columns: � 0 0 � ˜ ˜ U 22 ∈ R ( n − k ) × ( r − k ) U = , ˜ 0 U 22 7

Combining both relations gives � L 11 � � R 11 R 12 � 0 U = LR + ˜ U = ˜ L 12 I n − r 0 U 22 Back to the greedy algorithm: By a suitable permutation, suppose that the first k indices are given by I k = { 1 , . . . , k } . Then det ( U ( I k ∪ { k + i } , I k ∪ { k + 1 } )) = det ( U 11 ) · ˜ U 22 ( i , 1 ) . � Greedily maximizing determinant: Choose i such that | ˜ U 22 ( i , 1 )) | is maximal. This is Gaussian elimination with column pivoting! r steps of Gaussian elimination with column pivoting yields factorization of the form PU = LR , where ◮ P is permutation matrix � L 11 � with L 11 ∈ R r × r unit lower triangular and max | L ij | ≤ 1 ◮ L = L 12 ◮ R ∈ R r × r is upper triangular 8

Greedy row selection from orthonormal basis Simplified form of Gaussian elimination with column pivoting: Input: n × r matrix U Output: “Good” index set I ⊂ { 1 , . . . , n } , # I = r . Set I = ∅ . for k = 1 , . . . , r do Choose i ∗ = argmax i = 1 ,..., n | u ik | . Set I ← I ∪ i ∗ . u i ∗ , k U (: , k ) U ( i ∗ , :) 1 U ← U − end for Performance of greedy algorithm in practice often quite good, but there are counter examples (see later). 9

Analysis of greedy row selection Lemma (Theorem 8.15 in [Higham’2002]) Let T ∈ R n × n be an upper triangular matrix satisfying | t ii | ≥ | t ij | for j > i . Then √ | t ii | · � T − 1 � 2 ≤ 1 4 n + 6 n − 1 ≤ 2 n − 1 . 1 ≤ min 3 i Proof. By diagonal scaling, we may assume without loss of generality that t ii = 1. Let  1 − 1 · · · − 1  . ... .   0 1 .   ∈ R n × n . Z n =  .  ... ... .   . − 1   0 · · · 0 1 By induction, one shows that | T − 1 | ≤ Z − 1 (where the absolute value n and the inequality are understood elementwise). 10

By the monotonicity of the spectral norm � T − 1 � 2 ≤ � Z − 1 � 2 ≤ � Z − 1 � F . n n ) ij = 2 j − i − 1 for j > i (see exercises), we obtain Because of ( Z − 1 n n j − 1 n = 1 ( 4 j − 1 + 2 ) = 1 9 ( 4 n + 6 n − 1 ) , � � � � Z − 1 � 2 � 4 j − i − 1 � F = 1 + n 3 j = 1 i = 1 j = 1 completing the proof. Theorem For the index set returned by the greedy algorithm applied to orthnormal U ∈ R n × r , it holds that √ � U ( I , :) − 1 � 2 ≤ nr 2 r − 1 . 11

Proof. We start from PU = LR . (1) � L 1 � with L 1 ∈ R r × r , factorization (1) implies Partitioning L = L 2 U ( I , :) = L 1 R . Because PU is orthonormal, (1) also implies � R − 1 � 2 = � L � 2 � � U ( I , :) − 1 � 2 ≤ � L − 1 1 � 2 � R − 1 � 2 = � L − 1 1 � 2 � L � 2 . Because the magnitudes of the entries of L are bounded by 1, we have √ √ � L � 2 ≤ � L � F ≤ nr · max | ℓ ij | = nr . 1 in order to bound � L − 1 Applying the lemma to L T 1 � 2 completes proof. 12

Vector approximation Goal: Want to approximate vector f in subspace range ( U ) . For I = { i 1 , . . . , i k } define selection operator: � e i 1 � S I = e i 2 · · · e i k . Minimal error attained by orthogonal projection UU T . When replaced by oblique projection U ( S T I U ) − 1 S T I f increase of error bounded by result of lemma. Lemma � f − U ( S T I U ) − 1 S T I f � 2 ≤ � ( S T I U ) − 1 � 2 · � f − UU T f � 2 . Proof. Let Π = U ( S T I U ) − 1 S T I . Then � ( I − Π) f � 2 = � ( I − Π)( f − UU T f ) � 2 ≤ � I − Π � 2 � f − UU T f � 2 . The proof is completed by noting (and using the exercises), � I − Π � 2 = � Π � 2 ≤ � ( S T I U ) − 1 S T I � 2 = � ( S T I U ) − 1 � 2 . 13

Connection to interpolation We have S T I ( I − U ( S T I U ) − 1 S T I ) = 0 and hence � S T I ( f − U ( S T I U ) − 1 S T I f ) � 2 = 0 . Interpretation: f is “interpolated” exactly at selected indices. Example: Let f contain discretization of exp ( x ) on [ − 1 , 1 ] let U contain orthonormal basis of discretized monomials { 1 , x , x 2 , . . . } . 0.2 0.1 0 -0.1 -0.2 0 50 100 150 200 14

Connection to interpolation Iteration 1, Err ≈ 14 . 8 Iteration 2, Err ≈ 5.7 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Iteration 3, Err ≈ 0 . 7 Iteration 4, Err ≈ 0.14 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 15

Connection to interpolation Comparison between best approximation, greedy approximation, approximation obtained by simply selecting first r indices. 10 0 10 -5 10 -10 0 2 4 6 8 10 Terminology: ◮ Continuous setting: EIM (Empirical Interpolation method), [M. Barrault, Y. Maday, N. C. Nguyen, and A. T. Patera, An “empirical interpolation” method: Application to efficient reduced-basis discretization of partial differential equations, C. R. Math. Acad. Sci. Paris, 339 (2004), pp. 667–672]. ◮ Discrete setting: DEIM (Discrete EIM), [S. Chaturantabut and D. C. Sorensen. Nonlinear model reduction via discrete empirical interpolation. SIAM Journal on Scientific Computing, 32(5), 2737–2764, 2010]. 16

Low Rank Approximation Lecture 4 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Sampling based approximation Aim: Obtain rank- r approximation of m n matrix A from selected

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC

To Your Health What the ACA Has Meant and Will Mean for Consumers and Health Insurers NOLHGA

CFM Grand Rounds Continuing Education In order to receive credit for participating today,

Cost and Coverage Implications of the ACA Medicaid Expansion: National and State by State

Consumer Protection for Community Solar June 22, 2017 Housekeeping Use the red arrow to open

Next Steps for the ACA in Reaching Uninsured Low-Income Americans Linda Blumberg and Pamela Herd

Double Drift in DUNE: Discussion on HV Implications including trade-offs between ACA and CAC

Come rain or shine Commercial solutions for corporate lives Investigating Dishonesty, Fraud and

Employment law breakfast seminar Thursday 5 June 2014 Justin Govier, Partner Jonathan Bruck,