low rank approximation lecture 4
play

Low Rank Approximation Lecture 4 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Sampling based approximation Aim: Obtain rank- r approximation of m n matrix A from selected


  1. Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Sampling based approximation Aim: Obtain rank- r approximation of m × n matrix A from selected entries of A . Two different situations: ◮ Unstructured sampling: Let Ω ⊂ { 1 , . . . , m } × { 1 , . . . , n } . Solve � min � A − BC T � Ω , � M � 2 m 2 Ω = ij . ( i , j ) ∈ Ω Matrix completion problem solved by general optimization techniques (ALS, Riemannian optimization, convex relaxation). Will discuss later. ◮ Column/row sampling: Focus of this lecture. 2

  3. Row selection from orthonormal basis Task. Given orthonormal basis U ∈ R n × r find a “good” r × r submatrix of U . Classical problem already considered by Knuth. 1 Quantification of “good”: Smallest singular value not too small. Some notation: ◮ Given an m × n matrix A and index sets { i 1 , . . . , i k } , 1 ≤ i 1 < i 2 < · · · i k ≤ m , I = { j 1 , . . . , j ℓ } , 1 ≤ j 1 < j 2 < · · · j ℓ ≤ n , J = we let  · · ·  a i 1 , j 1 a i 1 , j n . .  ∈ R k × ℓ . . . A ( I , J ) =   . .  a i m , j 1 · · · a i m , j n The full index set is denoted by : , e.g., A ( I , :) . ◮ | det A | denotes the volume of a square matrix A . 1 Knuth, Donald E. Semioptimal bases for linear dependencies. Linear and Multilinear Algebra 17 (1985), no. 1, 1–4. 3

  4. Row selection from orthonormal basis Lemma (Maximal volume yields good submatrix) Let index set I, # I = r, be chosen such that | det ( U ( I , :)) | is maximized among all r × r submatrices. Then 1 � σ min ( U ( I , :)) ≤ r ( n − r ) + 1 Proof. 2 W.l.o.g. I = { 1 , . . . , r } . Consider � � I r U = UU ( I , :) − 1 = ˜ . B Because of det ˜ U ( J , :) = det U ( J , :) / det U ( I , :) for any J , submatrix # J = r , ˜ U ( I , :) has maximal volume among all r × r submatrices of ˜ U . 2 Following Lemma 2.1 in [Goreinov, S. A.; Tyrtyshnikov, E. E.; Zamarashkin, N. L. A theory of pseudoskeleton approximations. Linear Algebra Appl. 261 (1997), 1–21]. 4

  5. Maximality of ˜ U ( I , :) implies max | b ij | ≤ 1. Argument: If there was b ij with | b ij | > 1 then interchanging rows r + i and j of ˜ U would increase volume of ˜ U ( I , :) . We have � � � B � 2 ≤ � B � F ≤ ( n − r ) r max | b ij | ≤ 1 + ( n − r ) r . This yields the result: � � U ( I , :) − 1 � 2 = � UU ( I , :) − 1 � 2 = � 1 + � B � 2 2 ≤ 1 + ( n − r ) r . 5

  6. Greedy row selection from orthonormal basis Finding submatrix of maximal volume is NP hard. 3 Greedy algorithm (column-by-column): 4 ◮ First step is easy: Choose i such that | u i 1 | is maximal. ◮ Now, assume that k < r steps have been performed and the first k columns have been processed. Task: Choose optimal index in column k + 1. There is a one-to-one connection between greedy row selection and Gaussian elimination with column pivoting! 3 Civril, A., Magdon-Ismail, M.: On selecting a maximum volume sub-matrix of a matrix and related problems. Theoret. Comput. Sci. 410(47-49), 4801–4811 (2009) 4 Reinvented multiple times in the literature. 6

  7. Greedy row selection from orthonormal basis Gaussian elimination without pivoting applied to U ∈ R n × r : for k = 1 , . . . , r do 1 L (: , k ) ← u kk U (: , k ) , R ( k , :) ← U ( k , :) U ← U − L (: , k ) R ( k , :) end for Let ˜ U denote the updated matrix U obtained after k steps. Then: ◮ ˜ U = U − LR with � L 11 � ∈ R n × k , ∈ R k × r � R 11 � L = R = R 12 L 21 L 11 unit lower triangular, R 11 upper triangular. ◮ ˜ U is zero in first k rows and columns: � 0 0 � ˜ ˜ U 22 ∈ R ( n − k ) × ( r − k ) U = , ˜ 0 U 22 7

  8. Combining both relations gives � L 11 � � R 11 R 12 � 0 U = LR + ˜ U = ˜ L 12 I n − r 0 U 22 Back to the greedy algorithm: By a suitable permutation, suppose that the first k indices are given by I k = { 1 , . . . , k } . Then det ( U ( I k ∪ { k + i } , I k ∪ { k + 1 } )) = det ( U 11 ) · ˜ U 22 ( i , 1 ) . � Greedily maximizing determinant: Choose i such that | ˜ U 22 ( i , 1 )) | is maximal. This is Gaussian elimination with column pivoting! r steps of Gaussian elimination with column pivoting yields factorization of the form PU = LR , where ◮ P is permutation matrix � L 11 � with L 11 ∈ R r × r unit lower triangular and max | L ij | ≤ 1 ◮ L = L 12 ◮ R ∈ R r × r is upper triangular 8

  9. Greedy row selection from orthonormal basis Simplified form of Gaussian elimination with column pivoting: Input: n × r matrix U Output: “Good” index set I ⊂ { 1 , . . . , n } , # I = r . Set I = ∅ . for k = 1 , . . . , r do Choose i ∗ = argmax i = 1 ,..., n | u ik | . Set I ← I ∪ i ∗ . u i ∗ , k U (: , k ) U ( i ∗ , :) 1 U ← U − end for Performance of greedy algorithm in practice often quite good, but there are counter examples (see later). 9

  10. Analysis of greedy row selection Lemma (Theorem 8.15 in [Higham’2002]) Let T ∈ R n × n be an upper triangular matrix satisfying | t ii | ≥ | t ij | for j > i . Then √ | t ii | · � T − 1 � 2 ≤ 1 4 n + 6 n − 1 ≤ 2 n − 1 . 1 ≤ min 3 i Proof. By diagonal scaling, we may assume without loss of generality that t ii = 1. Let  1 − 1 · · · − 1  . ... .   0 1 .   ∈ R n × n . Z n =  .  ... ... .   . − 1   0 · · · 0 1 By induction, one shows that | T − 1 | ≤ Z − 1 (where the absolute value n and the inequality are understood elementwise). 10

  11. By the monotonicity of the spectral norm � T − 1 � 2 ≤ � Z − 1 � 2 ≤ � Z − 1 � F . n n ) ij = 2 j − i − 1 for j > i (see exercises), we obtain Because of ( Z − 1 n n j − 1 n = 1 ( 4 j − 1 + 2 ) = 1 9 ( 4 n + 6 n − 1 ) , � � � � Z − 1 � 2 � 4 j − i − 1 � F = 1 + n 3 j = 1 i = 1 j = 1 completing the proof. Theorem For the index set returned by the greedy algorithm applied to orthnormal U ∈ R n × r , it holds that √ � U ( I , :) − 1 � 2 ≤ nr 2 r − 1 . 11

  12. Proof. We start from PU = LR . (1) � L 1 � with L 1 ∈ R r × r , factorization (1) implies Partitioning L = L 2 U ( I , :) = L 1 R . Because PU is orthonormal, (1) also implies � R − 1 � 2 = � L � 2 � � U ( I , :) − 1 � 2 ≤ � L − 1 1 � 2 � R − 1 � 2 = � L − 1 1 � 2 � L � 2 . Because the magnitudes of the entries of L are bounded by 1, we have √ √ � L � 2 ≤ � L � F ≤ nr · max | ℓ ij | = nr . 1 in order to bound � L − 1 Applying the lemma to L T 1 � 2 completes proof. 12

  13. Vector approximation Goal: Want to approximate vector f in subspace range ( U ) . For I = { i 1 , . . . , i k } define selection operator: � e i 1 � S I = e i 2 · · · e i k . Minimal error attained by orthogonal projection UU T . When replaced by oblique projection U ( S T I U ) − 1 S T I f increase of error bounded by result of lemma. Lemma � f − U ( S T I U ) − 1 S T I f � 2 ≤ � ( S T I U ) − 1 � 2 · � f − UU T f � 2 . Proof. Let Π = U ( S T I U ) − 1 S T I . Then � ( I − Π) f � 2 = � ( I − Π)( f − UU T f ) � 2 ≤ � I − Π � 2 � f − UU T f � 2 . The proof is completed by noting (and using the exercises), � I − Π � 2 = � Π � 2 ≤ � ( S T I U ) − 1 S T I � 2 = � ( S T I U ) − 1 � 2 . 13

  14. Connection to interpolation We have S T I ( I − U ( S T I U ) − 1 S T I ) = 0 and hence � S T I ( f − U ( S T I U ) − 1 S T I f ) � 2 = 0 . Interpretation: f is “interpolated” exactly at selected indices. Example: Let f contain discretization of exp ( x ) on [ − 1 , 1 ] let U contain orthonormal basis of discretized monomials { 1 , x , x 2 , . . . } . 0.2 0.1 0 -0.1 -0.2 0 50 100 150 200 14

  15. Connection to interpolation Iteration 1, Err ≈ 14 . 8 Iteration 2, Err ≈ 5.7 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Iteration 3, Err ≈ 0 . 7 Iteration 4, Err ≈ 0.14 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 15

  16. Connection to interpolation Comparison between best approximation, greedy approximation, approximation obtained by simply selecting first r indices. 10 0 10 -5 10 -10 0 2 4 6 8 10 Terminology: ◮ Continuous setting: EIM (Empirical Interpolation method), [M. Barrault, Y. Maday, N. C. Nguyen, and A. T. Patera, An “empirical interpolation” method: Application to efficient reduced-basis discretization of partial differential equations, C. R. Math. Acad. Sci. Paris, 339 (2004), pp. 667–672]. ◮ Discrete setting: DEIM (Discrete EIM), [S. Chaturantabut and D. C. Sorensen. Nonlinear model reduction via discrete empirical interpolation. SIAM Journal on Scientific Computing, 32(5), 2737–2764, 2010]. 16

Recommend


More recommend