Communication avoiding low rank matrix approximation, a unified perspective on deterministic and randomized approaches L. Grigori and collaborators Alpines Inria Paris and LJLL, Sorbonne University Slides available at https://who.rocq.inria.fr/Laura.Grigori/Slides/ENLA20_Grigori.pdf July 8, 2020
Plan Motivation of our work Unified perspective on low rank matrix approximation Generalized LU decomposition Recent deterministic algorithms and bounds CA RRQR with 2D tournament pivoting CA LU with column/row tournament pivoting Randomized generalized LU and bounds Approximation of tensors Parallel HORRQR Conclusions 2 of 42
Motivation of our work The communication challenge � Cost of data movement dominates the cost of arithmetics: time and energy consumption � Per socket flop performance continues to increase: increase of number of cores per socket and/or number of flops per cycle 2008 Intel Nehalem 3.2GHz × 4 cores (51.2 GFlops/socket DP) 2020 A64FX 2.2GHz × 48 cores (3.37 TFlops/socket DP) 1 66x in 12 years � Interconnect latency: few µ s MPI latency Our focus: increasing scalability by reducing/minimizing coummunication while controlling the loss of information in low rank matrix (and tensor) approximation. 1 Fugaku supercomputer https://www.top500.org/system/179807/ 3 of 42
Unified perspective on low rank matrix approximation Low rank matrix approximation � Problem: given A ∈ R m × n , compute rank-k approximation ZW T , where Z ∈ R m × k and W T ∈ R k × n . � Problem ubiquitous in scientific computing and data analysis � column subset selection, linear dependency analysis, fast solvers for integral equations, H-matrices, � principal component analysis, image processing, data in high dimensions, ... 4 of 42
Unified perspective on low rank matrix approximation Low rank matrix approximation � Best rank-k approximation A opt , k = ˆ U k Σ k ˆ V T k is rank-k truncated SVD of A [Eckart and Young, 1936], with σ max ( A ) = σ 1 ( A ) ≥ . . . ≥ σ min ( A ) = σ min( m , n ) ( A ) || A − ˜ min A k || 2 = || A − A opt , k || 2 = σ k +1 ( A ) rank ( ˜ A k ) ≤ k � n � || A − ˜ � � σ 2 min A k || F = || A − A opt , k || F = j ( A ) � rank ( ˜ A k ) ≤ k j = k +1 Image, size 1190 × 1920 Rank-10 approximation, SVD Rank-50 approximation, SVD � Image source: https://pixabay.com/photos/billiards-ball-play-number-half-4345870/ 5 of 42
Unified perspective on low rank matrix approximation Low rank matrix approximation: trade-offs Communication optimal if computing a rank-k approximation on P processors requires # messages = Ω (log 2 P ) . 6 of 42
Unified perspective on low rank matrix approximation Low rank matrix approximation: trade-offs Communication optimal if computing a rank-k approximation on P processors requires # messages = Ω (log 2 P ) . 6 of 42
Unified perspective on low rank matrix approximation Idea underlying many algorithms A k = P A , where P = P o or P = P so is obtained as: Compute ˜ 1. Construct a low dimensional subspace X = range ( AV 1 ), V 1 ∈ R n × l that approximates well the range of A , e.g. � A − P o A � 2 ≤ γσ k +1 ( A ) , for some γ ≥ 1 , where Q 1 is orth. basis of ( AV 1 ) P o = AV 1 ( AV 1 ) + = Q 1 Q T 1 , or equiv P o a j := arg min x ∈ X � x − a j � 2 2. Select a semi-inner product � U 1 · , U 1 ·� 2 , U 1 ∈ R l ′ × m l ′ ≥ l , define P so = AV 1 ( U 1 AV 1 ) + U 1 , or equiv P so a j := arg min x ∈ X � U 1 ( x − a j ) � 2 with O. Balabanov, 2020 7 of 42
Unified perspective on low rank matrix approximation Idea underlying many algorithms A k = P A , where P = P o or P = P so is obtained as: Compute ˜ 1. Construct a low dimensional subspace X = range ( AV 1 ), V 1 ∈ R n × l that approximates well the range of A , e.g. � A − P o A � 2 ≤ γσ k +1 ( A ) , for some γ ≥ 1 , where Q 1 is orth. basis of ( AV 1 ) P o = AV 1 ( AV 1 ) + = Q 1 Q T 1 , or equiv P o a j := arg min x ∈ X � x − a j � 2 2. Select a semi-inner product � U 1 · , U 1 ·� 2 , U 1 ∈ R l ′ × m l ′ ≥ l , define P so = AV 1 ( U 1 AV 1 ) + U 1 , or equiv P so a j := arg min x ∈ X � U 1 ( x − a j ) � 2 with O. Balabanov, 2020 7 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Unified perspective: generalized LU factorization � U 1 � ∈ R m , m , V = ∈ R n , n , U , V Given A ∈ R m × n , U = � V 1 � V 2 U 2 invertible, U 1 ∈ R l ′ × m , V 1 ∈ R n × l , k ≤ l ≤ l ′ . � ¯ ¯ � A 11 A 12 ¯ UAV = A = ¯ ¯ A 21 A 22 � � ¯ ¯ � � I A 11 A 12 = A 21 ¯ ¯ A + S ( ¯ I A 11 ) 11 where ¯ A 11 ∈ R l ′ , l , ¯ A + 11 ¯ A 11 = I , S ( ¯ A 11 ) = ¯ A 22 − ¯ A 21 ¯ A + 11 ¯ A 12 . � Generalized LU computes the approximation � � ¯ � I ˜ U − 1 ¯ V − 1 � A glu = A 11 A 12 A 21 ¯ ¯ A + 11 [ U + 1 ( I − ( U 1 AV 1 )( U 1 AV 1 ) + ) + ( AV 1 )( U 1 AV 1 ) + ][ U 1 A ] = with J. Demmel and A. Rusciano, 2019 8 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Unified perspective: generalized LU factorization Given U 1 , A , V 1 , Q 1 orth. basis of ( AV 1 ), k ≤ l < l ′ , rank-k approximation, ˜ [ U + 1 ( I − ( U 1 AV 1 )( U 1 AV 1 ) + ) + ( AV 1 )( U 1 AV 1 ) + ][ U 1 A ] = A glu Unification for many existing algorithms: � If k ≤ l = l ′ and U 1 = Q T 1 , then ˜ A glu = Q 1 Q T 1 A = P o A � If k ≤ l = l ′ then ˜ A glu = AV 1 ( U 1 AV 1 ) − 1 U 1 A = P so A Approximation result: If k ≤ l < l ′ , F = � A − ˜ F + � ˜ � A − P so A � 2 A glu � 2 A glu − P so A � 2 F 9 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Unified perspective: generalized LU factorization Given U 1 , A , V 1 , Q 1 orth. basis of ( AV 1 ), k ≤ l < l ′ , rank-k approximation, ˜ [ U + 1 ( I − ( U 1 AV 1 )( U 1 AV 1 ) + ) + ( AV 1 )( U 1 AV 1 ) + ][ U 1 A ] = A glu Unification for many existing algorithms: � If k ≤ l = l ′ and U 1 = Q T 1 , then ˜ A glu = Q 1 Q T 1 A = P o A � If k ≤ l = l ′ then ˜ A glu = AV 1 ( U 1 AV 1 ) − 1 U 1 A = P so A Approximation result: If k ≤ l < l ′ , F = � A − ˜ F + � ˜ � A − P so A � 2 A glu � 2 A glu − P so A � 2 F 9 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Unified perspective: generalized LU factorization Given U 1 , A , V 1 , Q 1 orth. basis of ( AV 1 ), k ≤ l < l ′ , rank-k approximation, ˜ [ U + 1 ( I − ( U 1 AV 1 )( U 1 AV 1 ) + ) + ( AV 1 )( U 1 AV 1 ) + ][ U 1 A ] = A glu Unification for many existing algorithms: � If k ≤ l = l ′ and U 1 = Q T 1 , then ˜ A glu = Q 1 Q T 1 A = P o A � If k ≤ l = l ′ then ˜ A glu = AV 1 ( U 1 AV 1 ) − 1 U 1 A = P so A Approximation result: If k ≤ l < l ′ , F = � A − ˜ F + � ˜ � A − P so A � 2 A glu � 2 A glu − P so A � 2 F 9 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Desired properties of low rank matrix approximation 1. ˜ A k is ( k , γ ) low-rank approximation of A if it satisfies � A − ˜ A k � 2 ≤ γσ k +1 ( A ) , for some γ ≥ 1 . → Focus of both deterministic and randomized approaches 2. ˜ A k is ( k , γ ) spectrum preserving of A if 1 ≤ σ i ( A ) ≤ γ, for all i = 1 , . . . , k and some γ ≥ 1 σ i ( ˜ A k ) → Focus of deterministic approaches 3. ˜ A k is ( k , γ ) kernel approximation of A if 1 ≤ σ j ( A − ˜ A k ) ≤ γ, for all i = 1 , . . . , min( m , n ) − k and some γ ≥ 1 σ k + j ( A ) → Focus of deterministic approaches Goal γ is a low degree polynomial in k and the number of columns and/or rows of A for some of the most efficient algorithms. 10 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Desired properties of low rank matrix approximation 1. ˜ A k is ( k , γ ) low-rank approximation of A if it satisfies � A − ˜ A k � 2 ≤ γσ k +1 ( A ) , for some γ ≥ 1 . → Focus of both deterministic and randomized approaches 2. ˜ A k is ( k , γ ) spectrum preserving of A if 1 ≤ σ i ( A ) ≤ γ, for all i = 1 , . . . , k and some γ ≥ 1 σ i ( ˜ A k ) → Focus of deterministic approaches 3. ˜ A k is ( k , γ ) kernel approximation of A if 1 ≤ σ j ( A − ˜ A k ) ≤ γ, for all i = 1 , . . . , min( m , n ) − k and some γ ≥ 1 σ k + j ( A ) → Focus of deterministic approaches Goal γ is a low degree polynomial in k and the number of columns and/or rows of A for some of the most efficient algorithms. 10 of 42
Unified perspective on low rank matrix approximation Generalized LU decomposition Desired properties of low rank matrix approximation 1. ˜ A k is ( k , γ ) low-rank approximation of A if it satisfies � A − ˜ A k � 2 ≤ γσ k +1 ( A ) , for some γ ≥ 1 . → Focus of both deterministic and randomized approaches 2. ˜ A k is ( k , γ ) spectrum preserving of A if 1 ≤ σ i ( A ) ≤ γ, for all i = 1 , . . . , k and some γ ≥ 1 σ i ( ˜ A k ) → Focus of deterministic approaches 3. ˜ A k is ( k , γ ) kernel approximation of A if 1 ≤ σ j ( A − ˜ A k ) ≤ γ, for all i = 1 , . . . , min( m , n ) − k and some γ ≥ 1 σ k + j ( A ) → Focus of deterministic approaches Goal γ is a low degree polynomial in k and the number of columns and/or rows of A for some of the most efficient algorithms. 10 of 42
Recommend
More recommend