low rank approximation lecture 7
play

Low Rank Approximation Lecture 7 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Tensor Train (TT) decomposition A tensor X is in TT decomposition if it can be written as r d


  1. Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Tensor Train (TT) decomposition A tensor X is in TT decomposition if it can be written as r d − 1 r 1 � � X ( i 1 , . . . , i d ) = · · · U 1 ( 1 , i 1 , k 1 ) U 2 ( k 1 , i 2 , k 2 ) · · · U d ( k d − 1 , i d , 1 ) . k 1 = 1 k d − 1 = 1 ◮ Smallest possible tuple ( r 1 , . . . , r d − 1 ) is called TT rank of X . ◮ U µ ∈ R r µ − 1 × n µ × r µ (formally set r 0 = r d = 1) are called TT cores for µ = 1 , . . . , d . ◮ If TT ranks are not large � high compression ratio as d grows. ◮ TT decomposition multilinear wrt cores. ◮ TT decomposition connects to ◮ matrix products � Matrix Product States (MPS) in physics (see [Grasedyck/Kressner/Tobler’2013] for references) ◮ simultaneous matrix factorizations � SVD-based compression ◮ contractions and tensor network diagrams � design of efficient contraction-based algorithms 2

  3. TT decomposition and matrix products r d − 1 r 1 � � X ( i 1 , . . . , i d ) = · · · U 1 ( 1 , i 1 , k 1 ) U 2 ( k 1 , i 2 , k 2 ) · · · U d ( k d − 1 , i d , 1 ) . k 1 = 1 k d − 1 = 1 Let U µ ( i µ ) be i µ th slice of µ th core: U µ ( i µ ) := U µ (: , i µ , :) ∈ R r µ − 1 × r µ . Then X ( i 1 , i 2 , . . . , i d ) = U 1 ( i 1 ) U 2 ( i 2 ) · · · U d ( i d ) . Remark: Error analysis of matrix multiplication [Higham’2002] shows that TT decomposition may suffer from numerical instabilities if � U 1 ( i 1 ) � 2 � U 2 ( i 2 ) � 2 · · · � U d ( i d ) � 2 ≫ |X ( i 1 , i 2 , . . . , i d ) | . See [Bachmayr/Kazeev: arXiv:1802.09062] for more details. 3

  4. TT decomposition and matrix factorizations � X ( i 1 , . . . , i d ) = U 1 ( 1 , i 1 , k 1 ) U 2 ( k 1 , i 2 , k 2 ) · · · U d ( k d − 1 , i d , 1 ) . k 1 , k 2 ,..., k d − 1 For any 1 ≤ µ ≤ d − 1 group first µ factors and last d − µ factors together: X ( i 1 , . . . , i µ , i µ + 1 , . . . i d ) r µ � � � � = U 1 ( 1 , i 1 , k 1 ) · · · U µ ( k µ − 1 , i µ , k µ ) k µ = 1 k 1 ,..., k µ − 1 � � � · U µ + 1 ( k µ , i µ + 1 , k µ + 1 ) · · · U d ( k d − 1 , i d , 1 ) k µ + 1 ,..., k d − 1 This can be interpreted as a matrix-matrix product of two (large) matrices! 4

  5. TT decomposition and matrix factorizations The µ th unfolding of X ∈ R n 1 × n 2 ×···× n d is obtained by arranging the entries in a matrix X <µ> ∈ R ( n 1 n 2 ··· n µ ) × ( n µ + 1 ··· n d ) where the corresponding index map is given by ι : R n 1 ×···× n d → R n 1 ··· n µ × R n µ + 1 ··· n d , ι ( i 1 , . . . , i d ) = ( i row , i col ) , µ ν − 1 d ν − 1 � � � � i row = 1 + ( i ν − 1 ) n τ , i col = 1 + ( i ν − 1 ) n τ . ν = 1 τ = 1 ν = µ + 1 τ = µ + 1 5

  6. TT decomposition and matrix factorizations Define interface matrices X ≤ µ ∈ R n 1 n 2 ··· n µ × r µ , X ≥ µ + 1 ∈ R r µ × n µ + 1 n µ + 2 ··· n d as � X ≤ µ ( i row , j ) = U 1 ( 1 , i 1 , k 1 ) · · · U µ − 1 ( k µ − 2 , i µ − 1 , k µ − 1 ) U µ ( k µ − 1 , i µ , j ) k 1 ,..., k µ − 1 � X ≥ µ + 1 ( j , i col ) = U µ + 1 ( j , i µ + 1 , k µ + 1 ) U µ + 2 ( k µ + 1 , i µ + 2 , k µ + 2 ) · · · U d ( k d − 1 , i d , 1 ) k µ + 1 ,..., k d − 1 Matrix factorizations X <µ> = X ≤ µ X ≥ µ + 1 , µ = 1 , . . . , d − 1 . 6

  7. TT decomposition and matrix factorizations Important: These matrix factorizations are nested! X ≤ µ = ( I n µ ⊗ X ≤ µ − 1 ) U L X T ≥ µ = U R µ ( X T and ≥ µ + 1 ⊗ I n µ ) , µ , where U L µ = U < 2 > U R µ = U ( 1 ) = U < 1 > , . µ µ µ The relations X ≤ 1 = U L 1 and X ≤ µ = ( I n µ ⊗ X ≤ µ − 1 ) U L µ = 2 , . . . , d , µ , fully characterize the TT decomposition: vec ( X ) = X ≤ d ( I ⊗ X ≤ d − 1 ) U L = d ( I ⊗ I ⊗ X ≤ d − 2 )( I ⊗ U L d − 1 ) U L = d . . . ( I ⊗ · · · ⊗ I ⊗ U L 1 ) · · · ( I ⊗ U L d − 1 ) U L = d Perform an analogous calculation for X ≥ µ , that is, resolve the recursion X ≥ µ = U R EFY. µ ( X ≥ µ + 1 ⊗ In µ ) . 7

  8. TT decomposition and matrix factorizations Lemma The TT rank of a tensor is given by rank X < 1 > , . . . , rank X < d − 1 > � � Proof. Because of the connection to matrix factorizations, the TT rank � rank X < 1 > , . . . , rank X < d − 1 > � cannot be smaller than . We need to exclude that it can be larger. For this purpose, we construct a TT decomposition with rank X < 1 > , . . . , rank X < d − 1 > � � ( r 1 , . . . , r d − 1 ) := . Step 1: Factorize X < 1 > = U 1 ˜ X < 1 > ∈ R r 1 × n 2 ··· n d , X < 1 > , U 1 ∈ R n 1 × r 1 , ˜ and hence X < 1 > = U † ˜ 1 X < 1 > , U † 1 = ( U T 1 U 1 ) − 1 U T 1 In terms of tensors: X = U 1 ◦ 1 ˜ X . U 1 ≡ U 1 is the first TT core (and X ≥ 1 := ˜ X < 1 > ). 8

  9. Relation for second unfolding via Kronecker product: X < 2 > = ( I n 2 ⊗ U 1 )˜ X < 2 > . Together with full column rank of U 1 , this implies rank (˜ X < 2 > ) = rank ( X < 2 > ) = r 2 . Step 2: Factorize X < 2 > = U L X < 1 > ∈ R r 2 × n 3 ··· n d , ˜ 2 ˆ ˆ X < 1 > , U L 2 ∈ R r 1 n 2 × r 2 , 2 gives the second TT core U 2 ∈ R r 1 × n 2 × r 2 and X T ≥ 2 := ˆ U L X < 1 > . Relation for third unfolding via Kronecker product: X < 3 > = ( I n 3 ⊗ I n 2 ⊗ U 1 )˜ X < 3 > = ( I n 3 ⊗ I n 2 ⊗ U 1 )( I n 3 ⊗ U L 2 )ˆ X < 2 > Together with full column ranks of U 1 and U L 2 , this implies rank (ˆ X < 2 > ) = rank ( X < 3 > ) = r 3 . Continuing in this manner gives cores U µ ∈ R r µ − 1 × n µ × r µ such that vec ( X ) = ( I ⊗ · · · ⊗ I ⊗ U 1 ) · · · ( I ⊗ U L d − 1 ) vec ( U d ) This defines a TT decomposition. 9

  10. Truncation in TT format Proof of Lemma can be turned into practical algorithm (TT-SVD by [Oseledets’2011]) for approximating a given tensor X in TT format: Input: X ∈ R n 1 ×···× n d , target TT rank ( r 1 , . . . , r d − 1 ) . Output: TT cores U µ ∈ R r µ − 1 × n µ × r µ that define a TT decomposition approximating X . 1: Set r 0 = r d = 1. (and formally add leading singleton dimension to X ∈ R 1 × n 1 ×···× n d ). 2: for µ = 1 , . . . , d − 1 do Reshape X into X < 2 > ∈ R r µ − 1 n µ × n µ + 1 ··· n d . 3: Compute rank- r µ approximation X < 2 > ≈ U Σ V T (e.g., via SVD) 4: Reshape U into U µ ∈ R r µ − 1 × n µ × r µ . 5: Update X via X < 2 > ← U T X < 2 > = Σ V T . 6: 7: end for 8: Set U d = X . 10

  11. Truncation in TT format Theorem Let X SVD denote the tensor in TT decomposition obtained from TT-SVD. Then � ε 2 1 + · · · + ε 2 �X − X SVD � ≤ d , where µ = � X <µ> − T r µ ( X <µ> ) � 2 F = σ r µ + 1 ( X <µ> ) 2 + · · · . ε 2 Proof. After µ steps of the algorithm we have the following situation: ◮ Core tensors U 1 , . . . , U µ have been computed, defining X ≤ µ . ◮ Remaining tensor has size r µ × n µ + 1 × · · · × n d . Reshape remaining tensor into matrix Y ≥ µ ∈ R r µ × n µ + 1 ··· n d . Will prove relations X T Y ≥ µ + 1 = X T ≤ µ X <µ> , ≤ µ X ≤ µ = I , and � X <µ> − X ≤ µ Y ≥ µ + 1 � F ≤ � ε 2 1 + · · · + ε 2 (1) µ for µ = 1 , . . . , d − 1 by induction. For µ = d − 1, this shows the theorem. 11

  12. Line 3 in the µ th step of the algorithm proceeds by reshaping the remaining tensor from step µ − 1 (corresponding to Y ≥ µ ) into an array of size Y < 2 > ∈ R r µ − 1 n µ × n µ + 1 ··· n d . By induction assumption, Y < 2 > = ( I n µ ⊗ X T Y ≥ µ = X ≤ µ − 1 X <µ − 1 > ≤ µ − 1 ) X <µ> . ⇒ The matrix U L µ ≡ U computed in Line 4 contains left singular vectors of Y < 2 > . In particular, ( U L µ ) T U L µ = I . Together with the induction assumption and the relation X ≤ µ = ( I n µ ⊗ X ≤ µ − 1 ) U L µ , this implies X T ≤ µ X ≤ µ = I and Y T ≥ µ + 1 = X T ≤ µ X <µ> . Moreover, � Y < 2 > − T r µ ( Y < 2 > ) � F � ( I − U L µ ( U L µ ) T ) Y < 2 > � F = � X <µ> − T r µ ( X <µ> ) � F = ε µ . ≤ 12

  13. Finally, we obtain: � X <µ> − X ≤ µ X T ≤ µ X <µ> � 2 F � X <µ> − ( I ⊗ X ≤ µ − 1 ) U L µ ( U L µ ) T ( I ⊗ X ≤ µ − 1 ) T X <µ> � 2 = F � X <µ> − ( I ⊗ X ≤ µ − 1 )( I ⊗ X ≤ µ − 1 ) T X <µ> � 2 = F � ( I ⊗ X ≤ µ − 1 )( I ⊗ X ≤ µ − 1 ) T X <µ> + − ( I ⊗ X ≤ µ − 1 ) U L µ ( U L µ ) T ( I ⊗ X ≤ µ − 1 ) T X <µ> � 2 F � X <µ − 1 > − X ≤ µ − 1 X T ≤ µ − 1 X <µ − 1 > � 2 F + � ( I − U L µ ( U L µ ) T ) Y < 2 > � 2 = F ε 2 1 + · · · + ε 2 µ − 1 + ε 2 ≤ µ This completes the proof. 13

Recommend


More recommend