Low Rank Approximation Lecture 5 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Randomized column/row sampling Aim: Obtain rank- r approximation from randomly selected rows and columns of A . Popular sampling strategies: ◮ Uniform sampling. ◮ Sampling based on row/column norms. ◮ Sampling based on more complicated quantities. 2

Preliminaries on randomized sampling Exponential function example from Lecture 4 (Slide 14). Comparison between best approximation, greedy approximation, approximation obtained by randomly selecting rows. 10 0 10 0 10 -5 10 -5 10 -10 10 -10 0 2 4 6 8 10 0 2 4 6 8 10 10 0 10 0 10 -5 10 -5 10 -10 10 -10 0 2 4 6 8 10 0 2 4 6 8 10 3

Preliminaries on randomized sampling A simple way to fool uniformly random row selection: � 0 ( n − r ) × r � U = I r for n very large and r ≪ n . 4

Column sampling Basic algorithm aiming at rank- r approximation: 1. Sample (and possibly rescale) k > r columns of A � m × k matrix C . 2. Compute SVD C = U Σ V T and set Q = U r ∈ R m × r . 3. Return low-rank approximation QQ T A . ◮ Can be combined with streaming algorithm [Liberty’2007] to limit memory/cost of working with C . ◮ Quality of approximation crucially depends on sampling strategy. 5

Column sampling Lemma For any matrix C ∈ R m × r , let Q be the matrix computed above. Then 2 ≤ σ r + 1 ( A ) 2 + 2 � AA T − CC T � 2 . � A − QQ T A � 2 Proof. We have ( A − QQ T A )( A − QQ T A ) T ( I − QQ T ) CC T ( I − QQ T ) + ( I − QQ T )( AA T − CC T )( I − QQ T ) = Hence, � A − QQ T A � 2 � ( A − QQ T A )( A − QQ T A ) T � = λ max 2 + � AA T − CC T � 2 ( I − QQ T ) CC T ( I − QQ T ) � � ≤ λ max σ r + 1 ( C ) 2 + � AA T − CC T � 2 . = The proof is completed by applying Weyl’s inequality: σ r + 1 ( C ) 2 = λ r + 1 ( CC T ) ≤ λ r + 1 ( AA T ) + � AA T − CC T � 2 . 6

Random column sampling Using the lemma, the goal now becomes to approximate the matrix product AA T using column samples of A . Notation: � a 1 � � c 1 � A = · · · a n , C = · · · c k General sampling method: Input: A ∈ R m × n , probabilities p 1 , . . . , p n � = 0, integer k . Output: C ∈ R m × k containing selected columns of A . 1: for t = 1 , . . . , k do Pick j t ∈ { 1 , . . . , n } with P [ j t = ℓ ] = p ℓ , ℓ = 1 , . . . , n , 2: independently and with replacement. � Set c t = a j t / kp j t . 3: 4: end for 7

Random column sampling Lemma For the matrix C returned by algorithm, it holds that n a 2 i ℓ a 2 Var [( CC T ) ij ] = 1 − 1 � j ℓ E [ CC T ] = AA T , k ( AA T ) 2 ij . k p ℓ ℓ = 1 � c t c T 1 Proof. For fixed i , j , consider X t = t ) ij = kp jt a i , j t a j , j t , for which n 1 a i ,ℓ a j ,ℓ = 1 � k ( AA T ) ij . E [ X t ] = p ℓ kp ℓ ℓ = 1 Analogously, n a 2 i ℓ a 2 t ] − E [ X t ] 2 = 1 − 1 j ℓ Var ( X t ) = E [( X t − E [ X t ]) 2 ] = E [ X 2 � k 2 ( AA T ) 2 ij . k 2 p ℓ ℓ = 1 t X t ] = k · E [ X t ] = ( AA T ) ij , Because of independence, it follows E [ � and analogously for variance. 8

Random column sampling As a consequence of the lemma, E [ � AA T − CC T � 2 E [( AA T − CC T ) 2 � F ] = ij ] ij � Var [( CC T ) ij ] = ij n a 2 i ℓ a 2 1 − 1 � � j ℓ � � k ( AA T ) 2 = ij k p ℓ ij ℓ = 1 � n � 1 1 � � a ℓ � 4 2 − � AA T � 2 = . F k p ℓ ℓ = 1 Lemma F minimizes E [ � AA T − CC T � 2 The choice p ℓ = � a ℓ � 2 2 / � A � 2 F ] and yields F ] = 1 E [ � AA T − CC T � 2 � � A � 4 F − � AA T � 2 � F k Proof. Established by showing that this choice of p ℓ satisfies first-order conditions of constrained optimization problem. 9

Random column sampling Norm based sampling: Input: A ∈ R m × n , integer k . Output: C ∈ R m × k containing selected columns of A . 1: Set p ℓ = � a ℓ � 2 2 / � A � 2 F for ℓ = 1 , . . . , n . 2: for t = 1 , . . . , k do Pick j t ∈ { 1 , . . . , n } with P [ j t = ℓ ] = p ℓ , ℓ = 1 , . . . , n , 3: independently and with replacement. � Set c t = a j t / kp j t . 4: 5: end for 5: Compute SVD C = U Σ V T and set Q = U r ∈ R m × r . 5: Return low-rank approximation QQ T A . 10

Random column sampling Lemma For the matrix C returned by algorithm, it holds with probability 1 − δ that η � AA T − CC T � F ≤ √ � A � F , k � where η = 1 + 8 · log( 1 /δ ) . Proof. Aim at applying Azuma-Hoeffding inequality. Define F ( i 1 , i 2 , . . . , i k ) = � AA T − CC T � F , � a i 1 · · · � with C = a i k . Quantify the effect of varying an index (w.l.o.g. the first one) on F : | F ( i 1 , i 2 , . . . , i k ) − F ( i ′ 1 , i 2 , . . . , i k ) | � � AA T − CC T � F − � AA T − C ′ C ′ T � F � � = � 1 1 � CC T − C ′ C ′ T � F ≤ � a i 1 � 2 1 � 2 ≤ 2 + � a i ′ 2 kp i 1 kp i ′ 1 2 k � A � 2 ≤ F := ∆ . 11

Random column sampling This implies that Doob martingales g ℓ = E [ f ( i 1 , . . . , i k ) | i 1 , . . . , i ℓ ] for 1 ≤ ℓ ≤ k satisfy | g ℓ + 1 − g ℓ | ≤ ∆ . Note that g k = E [ � AA T − CC T � F ] . By lemma and Jensen’s inequality √ we know that g k ≤ � A � 2 F / k . Applying Azuma-Hoeffding inequality to g k yields √ � AA T − CC T � F ≥ � A � 2 ≤ exp( − γ 2 / 2 k ∆ 2 ) =: δ. � � P F / k + γ � Setting γ = 8 · log( 1 /δ ) completes the proof. 12

Random column sampling Theorem (Drineas/Kannan/Mahoney’2006) For the matrix Q returned by the algorithm above it holds that � A − QQ T A � 2 ≤ σ 2 r + 1 ( A ) + ε � A � 2 F for k ≥ 4 /ε 2 . � � E 2 With probability at least 1 − δ , � A − QQ T A � 2 2 ≤ σ 2 r + 1 ( A ) + ε � A � 2 � 8 · log( 1 /δ )) 2 /ε 2 . F for k ≥ 4 ( 1 + Proof. Follows from combining very first lemma with last two lemmas. Remarks: ◮ Dependence of k on ε pretty bad. Unlikely to achieve something significantly better without assuming further properties of A (e.g., incoherence of singular vectors) with sampling based on row norms only. ◮ Simple “counter example”: � � 1 1 1 1 ∈ R n × ( n + 1 ) . A = √ n e 1 √ n e 1 · · · √ n e 1 √ n e 2 13

Random column sampling [Drineas/Mahoney/Muthukrishnan’2007]: Let V k contain k dominant right singular vectors of A . Setting p ℓ = � V k ( ℓ, :) � 2 2 / k , ℓ = 1 , . . . , n and sampling O ( k 2 (log 1 /δ ) /ε 2 ) columns 1 yields � A − QQ T A � F ≤ ( 1 + ε ) � A − T k ( A ) � F with probability 1 − δ . Relative error bound! CUR decomposition can be obtained by applying ideas to rows and columns (yielding R and C , respectively) and choosing U appropriately. 1 There are variants that improve this to O ( k log k log( 1 /δ ) /ε 2 ) . 14

Low Rank Approximation Lecture 5 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Randomized column/row sampling Aim: Obtain rank- r approximation from randomly selected rows and

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC

Basic Definitions and Facts Iftach Haitner Tel Aviv University. October 28, 2014 Iftach Haitner

Todays exercises 5.17: Football Pools 5.18: Cells of Line and Hyperplane Arrangements

Probability, Entropy, and Inference Ensemble X is a triple ( x, A X , P X ) , where Based on

H Filtering of Uncertain LPV Systems with Time-Delay C.Briat, O.Sename and JF.Lafay August

The Total Curvature and Betti Numbers of Complex Projective Manifolds Convex, Discrete and

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference June 26-27, 2007

EXO-200 Mike Jewell Stanford University NorCal HEP-EXchange December 2 nd , 2017 Neutrinoless

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.