structure tensors Lek-Heng Lim July 18, 2017
acknowledgments Turner, many MS students as well Fahroo), NSF thank you all from the bottom of my heart • kind colleagues who nominated/supported me: ⋄ Shmuel Friedland ⋄ Sayan Mukherjee ⋄ Pierre Comon ⋄ Jiawang Nie ⋄ Ming Gu ⋄ Bernd Sturmfels ⋄ Jean Bernard Lasserre ⋄ Charles Van Loan • postdocs: Ke Ye, Yang Qi, Jose Rodriguez, Anne Shiu • students: Liwen Zhang, Ken Wong, Greg Naitzat, Kate • brilliant collaborators: too many to list • funding agencies: AFOSR, DARPA (special thanks to Fariba
motivation
goal: fjnd fastest algorithms m • fast algorithms are rarely obvious algorithms • want fast algorithms for bilinear operation β : U × V → W ( A , x ) �→ A x , ( A , B ) �→ AB , ( A , B ) �→ AB − BA • embed into appropriate algebra A ι U ⊗ V A ⊗ A β A W π • systematic way to discover new algorithms via structure tensors µ β and µ A • fastest algorithms: rank of structure tensor • stablest algorithms: nuclear norm of structure tensor
ubiquitous problems • linear equations, least squares, eigenvalue problem, etc A x = b , min ∥ A x − b ∥ , A x = λ x , x = exp ( A ) b • backbone of numerical computations • almost always: A ∈ C n × n has structure • very often: A ∈ C n × n prohibitively high-dimensional • impossible to solve without exploiting structure
structured matrices h n f -circulant, symmetric, skew-seymmetric, triangular Toeplitz, h n symmetric Toeplitz, etc advantage of them” [Wilkinson, 1971] • sparse: “any matrix with enough zeros that it pays to take • classical: circulant, Toeplitz, Hankel · · · t 0 t − 1 t 1 − n h 0 h 1 h n − 1 ... ... t 1 t 0 h 1 h 2 T = , H = . . ... ... ... ... . . . . t − 1 · · · t n − 1 t 1 t 0 h n − 1 h 2 n − 2 • many more: banded, triangular, Toeplitz-plus-Hankel,
multilevel • 2 -level: block-Toeplitz-Toeplitz-blocks ( bttb ): T 0 T − 1 T 1 − n ... T 1 T 0 ∈ C mn × mn T = ... ... T − 1 T n − 1 T 1 T 0 where T i ∈ C m × m are Toeplitz matrices • 3 -level: block-Toeplitz with bttb blocks • 4 -level: block- bttb with bttb blocks • and so on • also multilevel versions of: • block-circulant-circulant-blocks ( bccb ) • block-Hankel-Hankel-blocks ( bhhb ) • block-Toeplitz-plus-Hankel-Toeplitz-plus-Hankel-blocks ( bththb )
if A diagonalizable Krylov subspace methods • easiest way to exploit structure in A • basic idea: by Cayley–Hamilton, α 0 I + α 1 A + · · · + α d A d = 0 for some d ≤ n , so A − 1 = − α 1 I − α 2 A − · · · − α d A d − 1 α 0 α 0 α 0 and so x = A − 1 b ∈ span { b , A b , . . . , A d − 1 b } • one advantage: d can be much smaller than n , e.g. d = number of distinct eigenvalues of A • another advantage: reduces to forming matrix-vector product ( A , x ) �→ A x effjciently
fastest algorithms where ignores addition, subtraction, scalar multiplication • bilinear complexity: counts only multiplication of variables, • Gauss’s method ( a + bi )( c + di ) = ( ac − bd ) + i ( bc + ad ) = ( ac − bd ) + i [( a + b )( c + d ) − ac − bd ] • usual: 4 × ’s and 2 ± ’s; Gauss: 3 × ’s and 5 ± ’s • Strassen’s algorithm [ a 1 ] [ b 1 ] [ ] a 1 b 1 + a 2 b 2 β + γ + ( a 1 + a 2 − a 3 − a 4 ) b 4 a 2 b 2 = α + γ + a 4 ( b 2 + b 3 − b 1 − b 4 ) α + β + γ a 3 a 4 b 3 b 4 α = ( a 3 − a 1 )( b 3 − b 4 ) , β = ( a 3 + a 4 )( b 3 − b 1 ) , γ = a 1 b 1 +( a 3 + a 4 − a 1 )( b 1 + b 4 − b 3 ) • usual: 8 × ’s and 8 ± ’s; Strassen: 7 × ’s and 15 ± ’s
why minimize multiplications? consumption, number of gates, code space GPU, motion coprocessor, smart chip matrix multiplication vastly more expensive than matrix addition • nowadays: latency of FMUL ≈ latency of FADD • may want other measures of computational cost: e.g. energy • multiplier requires many more gates than adder (e.g. 18-bit: 2200 vs 125) → more wires/transistors → more energy • may not use general purpose CPU: e.g. ASIC, DSP, FPGA, • block operations: A , B , C , D ∈ R n × n ( A + iB )( C + iD ) = ( AC − BD ) + i [( A + B )( C + D ) − AC − BD ]
structure tensors
structure tensor • bilinear operator β : U × V → W , β ( a 1 u 1 + a 2 u 2 , v ) = a 1 β ( u 1 , v ) + a 2 β ( u 2 , v ) , β ( u , a 1 v 1 + a 2 v 2 ) = a 1 β ( u , v 1 ) + a 2 β ( u , v 2 ) • there exists unique 3 -tensor µ β ∈ U ∗ ⊗ V ∗ ⊗ W such that given any ( u , v ) ∈ U × V we have β ( u , v ) = µ β ( u , v , · ) ∈ W • examples of β : U × V → W , ( A , x ) �→ A x , ( A , B ) �→ AB , ( A , B ) �→ AB − BA • call µ β structure tensor of bilinear map β
structure constants hypermatrix • if we give µ β coordinates, i.e., choose bases on U , V , W , get ( µ ijk ) ∈ C m × n × p where m = dim U , n = dim V , p = dim W , ∑ p β ( u i , v j ) = k =1 µ ijk w k , i = 1 , . . . , m , j = 1 , . . . , n • d -dimensional hypermatrix is d -tensor in coordinates • call µ ijk structure constants of β
example: physics is Levi-Civita symbol • g Lie algebra with basis { e i : i = 1 , . . . , n } ∑ n [ e i , e j ] = k =1 c ijk e k • ( c ijk ) ∈ C n × n × n structure constants measure self-interaction • structure tensor of g is ∑ n j ⊗ e k ∈ g ∗ ⊗ g ∗ ⊗ g i , j , k =1 c ijk e ∗ i ⊗ e ∗ µ g = • take g = so 3 , real 3 × 3 skew symmetric matrices and 0 0 0 0 0 − 1 0 − 1 0 e 1 = , e 2 = , e 3 = 0 0 − 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 • structure tensor of so 3 is ∑ 3 i , j , k =1 ε ijk e ∗ i ⊗ e ∗ j ⊗ e k , µ so 3 = where ε ijk = ( i − j )( j − k )( k − i ) 2
example: numerical computations to multiply two matrices [Strassen, 1973] • for A = ( a ij ) ∈ C m × n , B = ( b jk ) ∈ C n × p , ∑ m , n , p ∑ m , n , p i , j , k =1 E ∗ ik ( A ) E ∗ AB = i , j , k =1 a ik b kj E ij = kj ( B ) E ij ij : C m × n → C , A �→ a ij where E ij = e i e T j ∈ C m × n and E ∗ ∑ m , n , p • let i , j , k =1 E ∗ ik ⊗ E ∗ µ m , n , p = kj ⊗ E ij write µ n = µ n , n , n • structure tensor of matrix-matrix product µ m , n , p ∈ ( C m × n ) ∗ ⊗ ( C n × p ) ∗ ⊗ C m × p ∼ = C mn × np × pm • later: rank gives minimal number of multiplications required
example: computer science NP-hard problems max n m max of matrix-matrix product [LHL, 2016] n m max • A ∈ R m × n , there exists K G > 0 such that ∑ ∑ a ij ⟨ x i , y j ⟩ x 1 ,..., x m , y 1 ,..., y n ∈ S m + n − 1 i =1 j =1 ∑ ∑ ≤ K G a ij ε i δ j . ε 1 ,...,ε m ,δ 1 ,...,δ n ∈{− 1 , +1 } i =1 j =1 • remarkable: K G independent of m and n [Grothendieck, 1953] • important: unique games conjecture and sdp relaxations of • best known bounds: 1 . 676 ≤ K G ≤ 1 . 782 • Grothendieck’s constant is injective norm of structure tensor µ m , n , m + n ( A , X , Y ) ∥ µ m , n , m + n ∥ 1 , 2 , ∞ := ∥ A ∥ ∞ , 1 ∥ X ∥ 1 , 2 ∥ Y ∥ 2 , ∞ A , X , Y ̸ =0
example: algebraic geometry of an associative algebra [Kontsevich–Manin, 1994] • quantum potential of quantum cohomology ∞ ∑ z 3 d − 1 Φ( x , y , z ) = 1 2( xy 2 + x 2 z ) + N ( d ) (3 d − 1)! e dy d =1 N ( d ) is number of rational curves of degree d on the plane passing through 3 d − 1 points in general position 2 ( xy 2 + x 2 z ) + φ ( y , z ) , then φ satisfjes • Φ( x , y , z ) = 1 φ zzz = φ 2 yyz − φ yyy φ yzz • can be transformed into Painlevé-six • equivalent to third order derivative of Φ being structure tensor
bilinear complexity = tensor rank • A ∈ C m × n × p , u ⊗ v ⊗ w := ( u i v j w k ) ∈ C m × n × p { } ∑ r i =1 λ i u i ⊗ v i ⊗ w i rank ( A ) = min r : A = • number of multiplications given by rank ( µ n ) • asymptotic growth • usual: O ( n 3 ) • earliest: O ( n log 2 7 ) [Strassen, 1969] • longest: O ( n 2 . 375477 ) [Coppersmith–Winograd, 1990] • recent: O ( n 2 . 3728642 ) [Williams, 2011] • latest: O ( n 2 . 3728639 ) [Le Gall, 2014] • exact: O ( n ω ) where ω := inf { α : rank ( µ n ) = O ( n α ) } • see [Bürgisser–Clausen–Shokrollahi, 1997]
rank, decomposition, nuclear norm • tensor rank { } ∑ r i =1 λ i u i ⊗ v i ⊗ w i rank ( µ β ) = min r : µ β = gives least number of multiplications needed to compute β • tensor decomposition ∑ r i =1 λ i u i ⊗ v i ⊗ w i µ β = gives an explicit algorithm for computing β • tensor nuclear norm [Friedland–LHL, 2016] {∑ r } ∑ r i =1 λ i u i ⊗ v i ⊗ w i , r ∈ N ∥ µ β ∥ ∗ = inf i =1 | λ i | : µ β = quantifjes optimal numerical stability of computing β
Recommend
More recommend