Numerical Linear Algebra Issues Underlying a Tensor Network Computation Charles F. Van Loan Department of Computer Science Cornell University Joint work with Stefan Ragnarsson, Center for Applied Mathematics, Cornell University NSF Workshop on Future Directions in Tensor-Based Computation and Modeling, February 20-21, 2009.
The Rayleigh Quotient One way to find the smallest eigenvalue and eigenvector of a sym- metric matrix H is to minimize r ( a ) = a T Ha a T a a = a min , λ = r ( a min ) ⇒ Ha = λa
Modeling Electron Interactions Have d “sites” (grid points) in physical space. The goal is compute a wave function, an element of a 2 d Hilbert space. The Hilbert space is a product of d 2-dimensional Hilbert spaces. (A site is either occupied or not occupied.) A (discretized) wavefunction is a d -tensor, 2-by-2-by-2-by-2...
The H-Matrix d d t ij H T v ijkℓ H T i H T Σ Σ H = + i H j j H k H ℓ i,j =1 i,j,k,ℓ =1 where H p = I 2 p − 1 ⊗ A ⊗ I 2 d − p O ( d 4 ) parameters describe 4 d entries ⎡ ⎤ 0 1 = ⎢ ⎥ A ⎢ ⎥ ⎢ ⎥ 0 0 ⎢ ⎥ ⎣ ⎦ t ij = t ji = ( t ij ) = T (1: d , 1: d ) T v ijkℓ = v kℓij = ( v ijkℓ ) = V (1: d , 1: d , 1: d, 1: d ) V v ijkℓ = v jikℓ = v ijℓk
⊗ · · · ⊗ · · · ⊗ · · · ⊗ · · · ⊗ If d = 40 and ⎡ ⎤ 0 1 H p = I 2 p − 1 ⊗ A ⊗ I 2 d − p A = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0 0 ⎢ ⎥ ⎣ ⎦ then H T 5 H T 9 H 20 H 27 = I 16 ⊗ A T ⊗ I 8 ⊗ A T ⊗ I 1024 ⊗ A ⊗ I 64 ⊗ A ⊗ I 8 Huge n from modest d via Kronecker Products
Perspective The Google Matrix ⇒ H 2 40 -by-2 40
↑ ↑ ↑ KP Methodologies ↑ ↑ ↑ KP Singular Value Decomposition: H = σ 1 ( B 1 ⊗ C 1 ) + · · · + σ R ( B R ⊗ C R ) KP Approximation: H ≈ B ⊗ C Flatten and Approximate: ⎡ ⎤ V 11 · · · V 1 d ⎢ ⎥ ⎢ ⎥ . . ... . . � � ⎢ ⎥ = . . ≈ Sym ⊗ Sym v ijkℓ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ V d 1 · · · V dd ⎢ ⎥ ⎣ ⎦
The Saga Continues... Wavefunction-related computations lead to a T Ha min a T a a ∈ IR N However a is SO BIG that it cannot be stored explicitly. The Curse of Dimensionality
Constrain for Tractability Minimize r ( a ) = a T Ha a T a subject to the contraint that a is a tensor network .
What is a Tensor Network? A tensor network is a tensor of high dimension that is built up from many sparsely connected tensors of low-dimension. A (2) A (5) A (3) A (4) A (6) A (7) A (1) A (8) A (10) A (9) Nodes are tensors and the edges are contractions.
A 5-Site Linear Tensor Network A (1) A (2) A (3) A (4) A (5) A (1) : 2 × m A (2) : m × m × 2 A (3) : m × m × 2 A (4) : m × m × 2 A (5) : m × 2 m is a parameter, typically around 100.
If a (1:2 , 1:2 , 1:2 , 1:2 , 1:2) is 5-site LTN then... a ( 1 , 1 , 1 , 1 , 1 ) A (1) A (2) A (3) A (4) A (5)
If a (1:2 , 1:2 , 1:2 , 1:2 , 1:2) is 5-site LTN then... a ( 2 , 1 , 1 , 1 , 1 ) A (1) A (2) A (3) A (4) A (5)
If a (1:2 , 1:2 , 1:2 , 1:2 , 1:2) is 5-site LTN then... a ( 1 , 2 , 1 , 1 , 1 ) A (1) A (2) A (3) A (4) A (5)
If a (1:2 , 1:2 , 1:2 , 1:2 , 1:2) is 5-site LTN then... a ( 2 , 2 , 1 , 1 , 1 ) A (1) A (2) A (3) A (4) A (5)
If a (1:2 , 1:2 , 1:2 , 1:2 , 1:2) is 5-site LTN then... a ( 1 , 1 , 2 , 1 , 1 ) A (1) A (2) A (3) A (4) A (5) A length-2 d vector that is represented by O ( dm 2 ) numbers.
LTN(5,m): Scalar Definition m m m m Σ Σ Σ Σ * * * * a ( n 1 , n 2 , n 3 , n 4 , n 5 ) = i 1=1 i 2=1 i 3=1 i 4=1 A (1) ( n 1 , i 1 ) ∗ A (2) ( i 1 , i 2 , n 2 ) ∗ A (3) ( i 2 , i 3 , n 3 ) ∗ A (4) ( i 3 , i 4 , n 4 ) ∗ A (5) ( i 4 , n 5 )
LTN(5,m): Scalar Definition m m m m Σ Σ Σ Σ * * * * a ( n 1 , n 2 , n 3 , n 4 , n 5 ) = i 1=1 i 2=1 i 3=1 i 4=1 A (1) ( n 1 , i 1 ) ∗ A (2) ( i 1 , i 2 , n 2 ) ∗ A (3) ( i 2 , i 3 , n 3 ) ∗ A (4) ( i 3 , i 4 , n 4 ) ∗ A (5) ( i 4 , n 5 ) Since a contraction is a generalized matrix product, then shouldn’t it be described that way?
The Block Vec Product ⊙ ⎡ ⎤ F 1 G 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F 1 G 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ G 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ F 1 F 1 G 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎦ ⊙ ⎢ G 2 ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F 2 F 2 G 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ G 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ ⎥ F 2 G 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ F 2 G 3 ⎥ ⎣ ⎦
The Block Vec Product ⊙ ⎡ ⎤ F 1 G 1 H 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F 1 G 1 H 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F 1 G 2 H 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ F 1 G 1 H 1 F 1 G 2 H 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎦ ⊙ ⎦ ⊙ ⎦ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ F 2 ⎥ ⎢ G 2 ⎥ ⎢ H 2 ⎥ F 2 G 1 H 1 ⎢ ⎥ ⎣ ⎣ ⎣ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ F 2 G 1 H 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F 2 G 2 H 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F 2 G 2 H 2 ⎢ ⎥ ⎣ ⎦
LTN(5,m): ⊙ Definition ⎡ ⎤ a (1 , 1 , 1 , 1 , 1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a (2 , 1 , 1 , 1 , 1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a (1 , 2 , 1 , 1 , 1) ⎢ ⎥ ⎢ ⎥ ⎢ . ⎥ . ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a (1 , 2 , 2 , 2 , 2) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a (2 , 2 , 2 , 2 , 2) ⎣ ⎦ = ⎣ A (1) (1 , :) ⎣ A (2) (: , : , 1) ⎣ A (3) (: , : , 1) ⎣ A (4) (: , : , 1) ⎣ A (5) (: , 1) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ A (1) (2 , :) ⎥ ⎢ A (2) (: , : , 2) ⎥ ⎢ A (3) (: , : , 2) ⎥ ⎢ A (4) (: , : , 2) ⎥ ⎢ A (5) (: , 2) ⎥ ⎦
BlockVec of a Tensor If A = A (1: N 1 , 1: N 2 , 1: N 3 , 1: N 4 ) then ⎡ A (: , : , 1 , 1) ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A (: , : , 2 , 1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A (: , : , 3 , 1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ blockVec( A ) = N 3 = 3 , N 4 = 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A (: , : , 1 , 2) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A (: , : , 2 , 2) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A (: , : , 3 , 2) ⎢ ⎥ ⎣ ⎦
A “Canonical” Contraction A ( i, j, n 3 , n 4 , m 3 , m 4 , m 5 ) = Σ B ( i, k, n 3 , n 4 ) C ( k, j, m 3 , m 4 , m 5 ) k A (: , : , n 3 , n 4 , m 3 , m 4 , m 5 ) = B (: , : , n 3 , n 4 ) C (: , : , m 3 , m 4 , m 5 ) BlockVec( A ) = BlockVec( B ) ⊙ BlockVec( C )
General Contractions Σ B ( n 1 , n 2 , k, n 4 ) C ( m 1 , m 2 , m 3 , k, m 5 ) k ⇓ Transposition B ( n 1 , k, n 2 , n 4 ) ˜ ˜ Σ C ( k, m 2 , m 3 , m 1 , m 5 ) k ⇓ Canonical Contraction ˜ A ( n 1 , m 2 , n 2 , n 4 , m 3 , m 1 , m 5 ) ⇓ Transposition A ( n 1 , n 2 , n 4 , m 2 , m 3 , m 1 , m 5 )
The Path to High Performance? Optimized Level-3 BLAS Multidimensional Transpose Block Tensor Data Structures
The Saga Continues... Minimize r ( a ) = a T Ha a T a subject to the constraint that ⎣ A (1) (1 , :) ⎣ A (2) (: , : , 1) ⎣ A (3) (: , : , 1) ⎣ A (4) (: , : , 1) ⎣ A (5) (: , 1) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a = ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ A (1) (2 , :) ⎥ ⎢ A (2) (: , : , 2) ⎥ ⎢ A (3) (: , : , 2) ⎥ ⎢ A (4) (: , : , 2) ⎥ ⎢ A (5) (: , 2) ⎥ ⎦
Rephrased Minimize r ( A (1) , A (2) , A (3) , A (4) , A (5) ) = a T Ha a T a where ⎣ A (1) (1 , :) ⎣ A (2) (: , : , 1) ⎣ A (3) (: , : , 1) ⎣ A (4) (: , : , 1) ⎣ A (5) (: , 1) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a = ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ A (1) (2 , :) ⎥ ⎢ A (2) (: , : , 2) ⎥ ⎢ A (3) (: , : , 2) ⎥ ⎢ A (4) (: , : , 2) ⎥ ⎢ A (5) (: , 2) ⎥ ⎦
The Sweep Algorithm Minimize r ( A (1) , A (2) , A (3) , A (4) , A (5) ) = a T Ha a T a where ⎣ A (1) (1 , :) ⎣ A (2) (: , : , 1) ⎣ A (3) (: , : , 1) ⎣ A (4) (: , : , 1) ⎣ A (5) (: , 1) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a = ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ A (1) (2 , :) ⎥ ⎢ A (2) (: , : , 2) ⎥ ⎢ A (3) (: , : , 2) ⎥ ⎢ A (4) (: , : , 2) ⎥ ⎢ A (5) (: , 2) ⎥ ⎦ and only A (1) varies. (A small eigenproblem.)
The Sweep Algorithm Minimize r ( A (1) , A (2) , A (3) , A (4) , A (5) ) = a T Ha a T a where ⎣ A (1) (1 , :) ⎣ A (2) (: , : , 1) ⎣ A (3) (: , : , 1) ⎣ A (4) (: , : , 1) ⎣ A (5) (: , 1) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a = ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎦ ⊙ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ A (1) (2 , :) ⎥ ⎢ A (2) (: , : , 2) ⎥ ⎢ A (3) (: , : , 2) ⎥ ⎢ A (4) (: , : , 2) ⎥ ⎢ A (5) (: , 2) ⎥ ⎦ and only A (2) varies. (A small eigenproblem.)
Recommend
More recommend