tc cim empowering tensor comprehensions for computing in
play

TC-CIM: Empowering Tensor Comprehensions for Computing In Memory - PowerPoint PPT Presentation

TC-CIM: Empowering Tensor Comprehensions for Computing In Memory Andi Drebes 1 Lorenzo Chelini 2,3 Oleksandr Zinenko 4 Albert Cohen 4 Henk Corporaal 2 Tobias Grosser 5 Kanishkan Vadivel 2 Nicolas Vasilache 4 1 Inria and erieure 2 TU Eindhoven 3


  1. TC-CIM: Empowering Tensor Comprehensions for Computing In Memory Andi Drebes 1 Lorenzo Chelini 2,3 Oleksandr Zinenko 4 Albert Cohen 4 Henk Corporaal 2 Tobias Grosser 5 Kanishkan Vadivel 2 Nicolas Vasilache 4 1 Inria and ´ erieure 2 TU Eindhoven 3 IBM Research Zurich Ecole Normale Sup´ 4 Google 5 ETH Zurich 01/22/2020

  2. Detecting Operations for Accelerators High-Level Optimized Code Optimizing Speci fi cation for Accelerator Compiler ◮ Goal: Reliably detect operations for efficient offloading Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 1 / 16

  3. Detecting Operations for Accelerators High-Level Optimized Code Optimizing Speci fi cation for Accelerator Compiler ◮ Goal: Reliably detect operations for efficient offloading ◮ At which stage? ◮ On which representation? ◮ Create reusable infrastructure Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 1 / 16

  4. Von-Neumann Bottleneck Host CPU Main Memory Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 2 / 16

  5. Von-Neumann Bottleneck Host CPU Main Memory Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 2 / 16

  6. Von-Neumann Bottleneck Host CPU Cache Main Memory Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 2 / 16

  7. Von-Neumann Bottleneck Accelerator Host CPU Cache Main Memory Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 2 / 16

  8. Von-Neumann Bottleneck Accelerator Host CPU PU PU PU Memory PU PU PU PU PU PU Cache Main Memory Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 2 / 16

  9. Compute In Memory (CIM) v 0 Context V DMA Register G 0,0 G 0,1 G 0,2 Host (ARM) v 1 CIM Tile V CPU CIM CIM Accelerator Column Buffers Accelerator G 1,0 G 1,1 G 1,2 v 2 L1 V Control Unit Row Buffers G 2,0 G 2,1 G 2,2 PCM I = v.G Crossbar S&H S&H S&H Output Bu ff ers ADC Main Memory Shift & Add ◮ Interweave Computation and Storage ◮ Example: Memristor-based Architecture from MNEMOSENE project ( https://www.mnemosene.eu ) Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 3 / 16

  10. Compute In Memory (CIM) v 0 Context V DMA Register G 0,0 G 0,1 G 0,2 Host (ARM) v 1 CIM Tile V CPU CIM CIM Accelerator Column Buffers Accelerator G 1,0 G 1,1 G 1,2 v 2 L1 V Control Unit Row Buffers G 2,0 G 2,1 G 2,2 PCM I = v.G Crossbar S&H S&H S&H Output Bu ff ers ADC Main Memory Shift & Add ◮ Interweave Computation and Storage ◮ Example: Memristor-based Architecture from MNEMOSENE project ( https://www.mnemosene.eu ) ◮ High energy efficiency and throughput with fixed functions (e.g., matrix multiplication) Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 3 / 16

  11. Detecting Accelerated Operations for CIM High-Level Optimized Code Optimizing Speci fi cation for Accelerator Compiler ◮ Goal: Reliably detect operations for efficient offloading ◮ At which stage? ◮ On which representation? ◮ Create reusable infrastructure Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 4 / 16

  12. Detecting Accelerated Operations for CIM Tensor ISO C + Optimizing Comprehensions Optimized library Compiler ◮ Goal: Reliably detect operations for efficient offloading ◮ At which stage? ◮ On which representation? ◮ Create reusable infrastructure Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 4 / 16

  13. Tensor Comprehensions Math-like notation ◮ Expresses operations on tensors ◮ Only information needed to define operation unambiguously ◮ Compiler infers shapes and iteration domains Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 5 / 16

  14. Tensor Comprehensions Math-like notation ◮ Expresses operations on tensors ◮ Only information needed to define operation unambiguously ◮ Compiler infers shapes and iteration domains Example: def mv(float(M,K) A, float(K) x) -> (C) { C(i) +=! A(i,k) * x(k) } Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 5 / 16

  15. Tensor Comprehensions: Compilation Tensor Comprehensions TC lang Halide IR Polyhedral Transformations CUDA backend isl Schedule Tree isl AST PTX CUDA C Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 6 / 16

  16. Integration of Loop Tactics Tensor Comprehensions TC lang Halide IR Polyhedral Pattern detection Transformations and marking Tactics backend isl Schedule Tree Loop Tactics ISO C99 isl AST Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 7 / 16

  17. Matching Example: Matrix Multiplications Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  18. Matching Example: Matrix Multiplications Arbitrary Ancestors Band Node Sequence Node Filter Node Filter Node Leaf Node Leaf Node Band Node Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  19. Matching Example: Matrix Multiplications Iteration over Arbitrary Ancestors i, n and k Band Node Sequence Node Filter Node Filter Node Leaf Node Leaf Node Band Node Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  20. Matching Example: Matrix Multiplications Iteration over Arbitrary Ancestors i, n and k Band Node Sequence Node Filter Node Filter Node Leaf Node Leaf Node Band Node 3D Iteration over input matrices Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  21. Matching Example: Matrix Multiplications Iteration over Arbitrary Ancestors i, n and k Band Node Sequence Node Filter Node Filter Node Leaf Node Leaf Node Band Node 2D Initialization 3D Iteration over of output matrix input matrices Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  22. Matching Example: Matrix Multiplications Arbitrary Ancestors Iteration over Mark Node GEMM Info Arbitrary Ancestors i, n and k Band Node Band Node Sequence Node Sequence Node Matching Filter Node Filter Node Filter Node Filter Node Leaf Node Leaf Node Leaf Node Leaf Node Band Node Band Node 2D Initialization 3D Iteration over of output matrix input matrices Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  23. Matching Example: Matrix Multiplications Arbitrary Ancestors Arbitrary Ancestors Iteration over Mark Node GEMM Info AST Mark Node GEMM Info Arbitrary Ancestors i, n and k Band Node Band Node For Node Sequence Node Sequence Node For Node Matching AST Generation Filter Node Filter Node Filter Node Filter Node User Node For Node Leaf Node Leaf Node Leaf Node Leaf Node User Node Band Node Band Node Band Node 2D Initialization 3D Iteration over of output matrix input matrices Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  24. Matching Example: Matrix Multiplications Arbitrary Ancestors Arbitrary Ancestors Iteration over Mark Node GEMM Info AST Mark Node GEMM Info Arbitrary Ancestors i, n and k Band Node Band Node For Node Sequence Node Sequence Node For Node Matching AST Generation Filter Node Filter Node Filter Node Filter Node User Node For Node Leaf Node Leaf Node Leaf Node Leaf Node User Node Band Node Band Node Band Node 2D Initialization 3D Iteration over Printing of output matrix input matrices Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 8 / 16

  25. Loop Tactics: Tree Matchers Tree Matcher defines pattern for subtree and captures nodes schedule node body; schedule node initBody; band schedule node schedule; auto matcher = sequence band(schedule, sequence( fi lter fi lter filter(initBody, hasGemmInitPattern, leaf ()), leaf leaf filter(body, hasGemmPattern, leaf ()))); Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 9 / 16

  26. Loop Tactics Access Relation Matchers Access Relation Matcher: Matches tensor accesses auto hasGemmPattern = [&]( schedule_node node) { auto _i = placeholder (); auto _j = placeholder (); auto _k = placeholder (); auto _A = arrayPlaceholder (); auto _B = arrayPlaceholder (); auto _C = arrayPlaceholder (); auto reads = /* get read accesses */; auto writes = /* get write accesses */; auto mRead = allOf( access(_C , _i , _j), access(_A , _i , _k), access(_B , _k , _j )); auto mWrite = allOf(access(_C , _i , _j )); return match(reads , mRead ). size () == 1 && match(writes , mWrite ). size () == 1; }; Drebes et al. – TC-CIM: Empowering Tensor Comprehensions for Computing In Memory 10 / 16

Recommend


More recommend