design of a high performance gemm like tensor tensor
play

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication - PowerPoint PPT Presentation

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction


  1. Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 1 / 19

  2. Outline Introduction 1 GEMM-like Tensor-Tensor Multiplication 2 Tensor Contraction Code Generator 3 Performance 4 Conclusion and Future Work 5 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 2 / 19

  3. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  4. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  5. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT 1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  6. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT 1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer Tensor Contraction Code Generator (TCCG) combine GETT, TTGT and LoG into a unified tool 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  7. Matrix-Matrix Multiplication Matrix-Matrix Multiplication A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← � k A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  8. Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  9. Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n // N-Loop for j = 0 : N − 1 // M-Loop for i = 0 : M − 1 tmp = 0 // K-Loop ( contracted ) for k = 0 : K − 1 tmp += A i , k B k , j // update C C i , j = α tmp + β C i , j Naive GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  10. Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n // N-Loop for n = 0 : nc : N − 1 // K-Loop ( contracted) for k = 0 : kc : K − 1 � B = identify_submatrix ( B , n , k ) � � // pack B into B B = packB( � � B ∈ R kc × nc � B ) // // M-Loop // N-Loop for m = 0 : mc : M − 1 for j = 0 : N − 1 � A = identify_submatrix ( A , m , k ) // M-Loop for i = 0 : M − 1 � � // pack A into A tmp = 0 A = packA( � � A ∈ R mc × kc � A ) // // K-Loop ( contracted ) for k = 0 : K − 1 � C = identify_submatrix ( C , m , n ) tmp += A i , k B k , j A � � // matrix -matrix product: B // update C macroKernel ( � A , � B , � C i , j = α tmp + β C i , j C , α, β ) Naive GEMM. High-performance GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  11. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  12. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  13. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  14. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  15. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  16. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 C m 1 , n 1 , n 2 , m 2 , n 3 ← A m 1 , k 1 , m 2 , k 2 B n 3 , k 2 , n 2 , k 1 , n 1 ... Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  17. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 C m 1 , n 1 , n 2 , m 2 , n 3 ← A m 1 , k 1 , m 2 , k 2 B n 3 , k 2 , n 2 , k 1 , n 1 ... ⇒ Quite similar to GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  18. GETT Tensor-Tensor Multiplication (Einstein notation) Let the input tensors A ∈ R S A 1 × S A 2 × ... S A r A , and B ∈ R S B 1 × S B 2 × ... S B r B update the output tensor C ∈ R S C 1 × S C 2 × ... S C r C : C Π C ( I m ∪ I n ) ← α A Π A ( I m ∪ I k ) B Π B ( I n ∪ I k ) + β C Π C ( I m ∪ I n ) . Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19

  19. GETT Tensor-Tensor Multiplication (Einstein notation) Let the input tensors A ∈ R S A 1 × S A 2 × ... S A r A , and B ∈ R S B 1 × S B 2 × ... S B r B update the output tensor C ∈ R S C 1 × S C 2 × ... S C r C : C Π C ( I m ∪ I n ) ← α A Π A ( I m ∪ I k ) B Π B ( I n ∪ I k ) + β C Π C ( I m ∪ I n ) . These index sets I m , I n and I k are critical I m := { m 1 , m 2 , ..., m γ } : free indices of A I n := { n 1 , n 2 , ..., n ζ } : free indices of B I k := { k 1 , k 2 , ..., k ξ } : contracted indices Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19

  20. GETT 1 // N-Loops 2 for n 1 = 1 : S n 1 3 // ... remaining N-loops omitted ... 4 for n ζ = 1 : S n ζ 5 // M-Loops 6 for m 1 = 1 : S m 1 7 // ... remaining M-loops omitted ... 8 for m γ = 1 : S m γ 9 tmp = 0 10 // K-Loops ( contracted ) 11 for k 1 = 1 : S k 1 12 // ... remaining K-loops omitted ... 13 for k ξ = 1 : S k ξ 14 tmp += A Π A ( m 1 ,..., m γ , k 1 ,..., k ξ ) B Π B ( k 1 ,..., k ξ, n 1 ,..., n ζ ) 15 // update C 16 C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) = α tmp + β C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) Naive GETT. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 7 / 19

  21. GETT 1 // N-Loops 2 for n 1 = 1 : S n 1 3 // ... remaining N-loops omitted ... 4 for n ζ = 1 : S n ζ 5 // M-Loops 6 for m 1 = 1 : S m 1 7 // ... remaining M-loops omitted ... 8 for m γ = 1 : S m γ 9 tmp = 0 10 // K-Loops ( contracted ) 11 for k 1 = 1 : S k 1 12 // ... remaining K-loops omitted ... 13 for k ξ = 1 : S k ξ 14 tmp += A Π A ( m 1 ,..., m γ , k 1 ,..., k ξ ) B Π B ( k 1 ,..., k ξ, n 1 ,..., n ζ ) 15 // update C 16 C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) = α tmp + β C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) Naive GETT. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 7 / 19

Recommend


More recommend