computing the best rank r 1 r 2 r 3 approximation of a
play

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a - PowerPoint PPT Presentation

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en Department of Mathematics Link oping University Sweden Joint work with Berkant Savas Harrachov 2007 1 Best rank k approximation of a matrix


  1. Computing the Best Rank − ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld´ en Department of Mathematics Link¨ oping University Sweden Joint work with Berkant Savas Harrachov 2007

  2. 1 Best rank − k approximation of a matrix Assume X T k X k = I and Y T k Y k = I � A − X k S k Y T min k � F =: min � A − ( X k , Y k ) · S k � F X k ,Y k ,S k X k ,Y k ,S k (Almost) equivalent problem: � X T max k AY k � F = max � A · ( X k , Y k ) � F X k ,Y k X k ,Y k Solution by SVD: X k S k Y T k = U k Σ k V T k = ( U k , V k ) · Σ k Eckart-Young property – Harrachov 2007 –

  3. 2 Sketch of “proof”: Determine u 1 and v 1 ( k = 1 ) Put u 1 and v 1 in orthogonal matrices ( u 1 � U ) and ( v 1 � V ) � σ 1 � 0 ( u 1 � U ) T A ( v 1 � v ) = 0 B Optimality = ⇒ zeros = ⇒ deflation: continue with B Orthogonality of vectors comes automatically Number of degrees of freedom in U k and V k is equal to the number of zeros produced. – Harrachov 2007 –

  4. 3 Best rank − ( k, k, k ) approximation of a tensor Assume X T X = Y T Y = Z T Z = I k X,Y,Z, S �A − ( X, Y, Z ) · S� F min ⇐ ⇒ X,Y,Z �A · ( X, Y, Z ) � F max Why is this problem much more complicated? Not enough degrees of freedom in X, Y, Z to zero many ( O ( k 3 ) + O ( kn 2 ) ) elements in A ⇓ Deflation is impossible in general ⇓ Orthogonality constraints must be enforced – Harrachov 2007 –

  5. 4 Talk outline • Some basic tensor concepts (For simplicity: only tensors of order 3) • Best rank- ( r 1 , r 2 , r 3 ) approximation problem • Optimization on the Grassmann manifold • Newton-Grassmann for solving the best rank- ( r 1 , r 2 , r 3 ) approximation problem • Numerical examples • Ongoing work – Harrachov 2007 –

  6. 5 “Old and New” Research Area • Tensor methods have been used since the 1960’s in psychometrics and chemometrics! Only recently in numerical community. • Available mathematical theory deals very little with computational aspects. Many fundamental mathematical problems are open! • Applications in signal processing and various areas of data mining. – Harrachov 2007 –

  7. 6 Two aspects of SVD Singular Value Decomposition: R m × n X = U Σ V T ❅ 0 V T ❅ ❅ = X U 0 m × n m × m m × n Singular value expansion: sum of rank-1 matrices: n � σ i u i v T X = i = + + · · · i =1 – Harrachov 2007 –

  8. 7 Two approaches to tensor decomposition � � � Tucker Model U (3) � � � � � � � � � � � � � � � = U (1) U (2) A S � � � � � � • Tucker 1966, numerous papers in psychometrics and chemometrics • De Lathauwer, De Moor, Vandewalle, SIMAX 2000: notation, theory. – Harrachov 2007 –

  9. 8 Expansion in rank-1 terms � � � � � � � � � � � � A = + + · · · � � � • Parafac/Candecomp/Kruskal: Harshman, Caroll, Chang 1970 • Numerous papers in psychometrics and chemometrics • Kolda, SIMAX 2001, Zhang, Golub, SIMAX 2001, De Silva and Lim 2006 – Harrachov 2007 –

  10. 9 Parafac/... model: low rank approximation � � � � � � � � � � � � � � ❛❛❛❛❛ ≈ ❛ A � � � � � The core tensor is zero except along the superdiagonal. – Harrachov 2007 –

  11. 10 Parafac/... model: low rank approximation � � � � � � � � � � � � � � ❛❛❛❛❛ ≈ ❛ A � � � � � The core tensor is zero except along the superdiagonal. Why is it difficult to obtain this? Because we do not have enough degrees of freedom to zero the tensor elements: O ( k 2 ) and O ( k 3 ) – Harrachov 2007 –

  12. 11 The Parafac approximation problem may be ill-posed! 1 Theorem 1. There are tensors A for which the problem min x i ,y i ,z i �A − x 1 ⊗ y 1 ⊗ z 1 − x 2 ⊗ y 2 ⊗ z 2 � F does not have a solution. The set of tensors for which the approximation problem does not have a solution has positive volume. The problem is illposed! (in exact arithmetic) A well-posed problem (in floating point) near to an ill-posed one is ill- conditioned: = ⇒ unstable computations. Still: There are applications (e.g. in chemistry) where the Parafac model corresponds closely to the process that generates the tensor data. 1 See De Silva and Lim (2006), Bini (1986) – Harrachov 2007 –

  13. 12 Mode − I multiplication of a tensor by a matrix 2 Contravariant multiplication n � R n × n × n ∋ B = ( W ) { 1 } · A , B ( i, j, k ) = w iν a νjk . ν =1 All column vectors in the 3-tensor are multiplied by the matrix W . Covariant multiplication n � R n × n × n ∋ B = A · ( W ) { 1 } , B ( i, j, k ) = a νjk w νi . ν =1 2 Lim’s notation – Harrachov 2007 –

  14. 13 Matrix-tensor multiplication performed in all modes in the same expression: ( X, Y, Z ) · A = A · ( X T , Y T , Z T ) Standard matrix multiplication of three matrices: XAY T = ( X, Y ) · A – Harrachov 2007 –

  15. 14 Inner product, orthogonality and norm Inner product (contraction: R n × n × n → R ) � �A , B� = a ijk b ijk i,j,k The Frobenius norm of a tensor is �A� = �A , A� 1 / 2 Matrix case � A, B � = tr( A T B ) – Harrachov 2007 –

  16. 15 Tensor SVD (HOSVD) 3 : A = ( U (1) , U (2) , U (3) ) · S � � � U (3) � � � � � � � � � � � � � � � = U (1) U (2) A S � � � � � � The “mass” of S is concentrated around the (1 , 1 , 1) corner. Not optimal: does not solve min rank( B )=( r 1 ,r 2 ,r 3 ) �A − B� 3 De Lathauwer et al (2000) – Harrachov 2007 –

  17. 16 Best rank − ( r 1 , r 2 , r 3 ) Approximation T Z S Y T ≈ A X Best rank − ( r 1 , r 2 , r 3 ) approximation: X T X = I, Y T Y = I, Z T Z = I X,Y,Z, S �A − ( X, Y, Z ) · S� , min The problem is over-parameterized! – Harrachov 2007 –

  18. 17 Best approximation: min rank( B )=( r 1 ,r 2 ,r 3 ) �A − B� Equivalent to � X,Y,Z Φ( X, Y, Z ) = 1 2 �A · ( X, Y, Z ) � 2 = 1 A 2 max jkl , 2 j,k,l � A jkl = a λµν x λj y µk z νl , λ,µ,ν subject to X T X = I r 1 , Y T Y = I r 2 , Z T Z = I r 3 – Harrachov 2007 –

  19. 18 Grassmann Optimization The Frobenius norm is invariant under orthogonal transformations: Φ( X, Y, Z ) = Φ( XU, Y V, ZW ) = 1 2 �A · ( XU, Y V, ZW ) � 2 for orthogonal U ∈ R r 1 × r 1 , V ∈ R r 2 × r 2 , and W ∈ R r 3 × r 3 . Maximize Φ over equivalence classes [ X ] = { XU | U orthogonal } . Product of Grassmann manifolds: Gr 3 = Gr( J, r 1 ) × Gr( K, r 2 ) × Gr( L, r 3 ) 1 ( X,Y,Z ) ∈ Gr 3 Φ( X, Y, Z ) = max max 2 �A · ( X, Y, Z ) , A · ( X, Y, Z ) � ( X,Y,Z ) ∈ Gr 3 – Harrachov 2007 –

  20. 19 Newton’s Method on one Grassmann Manifold Taylor expansion + linear algebra on tangent space 4 at X G ( X ( t )) ≈ G ( X (0)) + � ∆ , ∇ G � + 1 2 � ∆ , H (∆) � , Grassmann gradient: ( G x ) jk = ∂G Π X = I − XX T ∇ G = Π X G x , , ∂x jk The Newton equation for determining ∆ : ∂ 2 G Π X �G xx , ∆ � 1:2 − ∆ � X, G x � 1 = −∇ G, ( G xx ) jklm = . ∂X jk ∂X lm 4 Tangent space at X : all matrices Z satisfying Z T X = 0 . – Harrachov 2007 –

  21. 20 Newton-Grassmann Algorithm on Gr 3 Here: local coordinates Given tensor A and starting points ( X 0 , Y 0 , Z 0 ) ∈ Gr 3 repeat compute the Grassmann gradient ∇ � Φ compute the Grassmann Hessian � H matricize � H and vectorize ∇ � Φ solve D = ( D x , D y , D z ) from the Newton equation take a geodesic step along the direction D , giving new iterates (X,Y,Z) until �∇ � Φ � / Φ < TOL Implementation using TensorToolbox and object-oriented Grassmann classes in Matlab – Harrachov 2007 –

  22. 21 Newton’s method on Gr 3 Differentiate Φ( X, Y, Z ) along a geodesic curve ( X ( t ) , Y ( t ) , Z ( t )) in the direction (∆ x , ∆ y , ∆ z ) : ∂x st ∂t = (∆ x ) st , and � dX ( t ) � , dY ( t ) , dZ ( t ) = (∆ x , ∆ y , ∆ z ) , dt dt dt Since A · ( X, Y, Z ) is linear in X, Y, Z separately: d ( A · ( X, Y, Z )) = A · (∆ x , Y, Z ) + A · ( X, ∆ y , Z ) + A · ( X, Y, ∆ z ) . dt – Harrachov 2007 –

  23. 22 First Derivative d Φ dt = 1 d dt �A · ( X, Y, Z ) , A · ( X, Y, Z ) � = �A · (∆ x , Y, Z ) , A · ( X, Y, Z ) � 2 + �A · ( X, ∆ y , Z ) , A · ( X, Y, Z ) � + �A · ( X, Y, ∆ z ) , A · ( X, Y, Z ) � . We want to write �A · (∆ x , Y, Z ) , A · ( X, Y, Z ) � in the form � ∆ x , Φ x � Define the tensor F = A · ( X, Y, Z ) and write �A · (∆ x , Y, Z ) , F� =: �K x (∆ x ) , F� = � ∆ x , K ∗ x F� , Linear operator: ∆ x �− → K x (∆ x ) = A · (∆ x , Y, Z ) – Harrachov 2007 –

  24. 23 Adjoint Operator Linear operator: ∆ x �− → K x (∆ x ) = A · (∆ x , Y, Z ) with adjoint �K x (∆ x ) , F� = � ∆ x , K ∗ x F� = � ∆ x , �A · ( I, Y, Z ) , F � − 1 � where the partial contraction is defined � �B , C� − 1 ( i 1 , i 2 ) = b i 1 µν c i 2 µν µ,ν – Harrachov 2007 –

Recommend


More recommend