mapu a novel mathematical computing architecture
play

MaPU: A Novel Mathematical Computing Architecture Shashank Kedia - PowerPoint PPT Presentation

MaPU: A Novel Mathematical Computing Architecture Shashank Kedia & Robert Macy III 1 Why MaPU? High performance CPUs and GPUs have good theoretical performance but low power efficiency relative to performance Superscalar and


  1. MaPU: A Novel Mathematical Computing Architecture Shashank Kedia & Robert Macy III 1

  2. Why MaPU? High performance CPUs and GPUs have good theoretical performance but low power efficiency relative ● to performance ● Superscalar and GPGPU have been proven to be power inefficient ● Most systems operate at 60% of peak performance ● Supercomputers using thousands of processors have massive power and space requirements ● Develop a chip that can do mathematical calculations at a good performance to power ratio relative to gpus and cpus 2

  3. Architecture overview Three main components: Scalar Pipeline: communicates ● with the system on chip and controls microcode pipeline. Microcode Pipeline: Consists of ● functional units (FUs) defining data flow. Multi-Granularity Parallel ● Memory System (MGP) allows efficient custom data access patterns. 3

  4. Architecture Details: MGP Memory System MGP allows efficient data access patterns. Given parameters W, the number of bytes that can be accessed in parallel, N, the total capacity in bytes, and G, the number of bytes available for reading/writing, the memory system can be partitioned to define memory accesses. Physical banks combine to form logic banks. Each logic bank consists of G physical banks. 4

  5. Architecture Details: MGP Memory System (matrix accesses) Matrices can be accessed in row or column order. Matrix accesses in MGP requires storing the i-th row in the i mod W-th logic bank. Rows can be accessed by setting G=W and columns by setting G=1. 5

  6. Architecture Details: Cascading pipeline with state machine-based program model Dataflow can change to fit desired algorithm State machines can be used to describe each FU Facilitated by customizing FUs used and their Allows easier FU organization, user specifies each interactions via microcode. FU state machine and a final state machine specifying delays for ensuring appropriate execution order. 6

  7. Architecture Details: SoC Architecture Overview of tape-out design implemented by authors. APE (Algebraic Processing Engine) refers to the MaPU cores. CSU is a DMA controller. 7

  8. Results: Comparison with C66x All comparisons shown here are in simulations APE runs at 1GHz and C66x at 1.25 GHz 8

  9. Result: Power Usage 9

  10. Results: Power Usage Figure 15 in the paper seems to be incorrect and a copy of Figure 14 10

  11. Results: Comparison with other processors Source: M. H. Ionica and D. Gregg, “The movidius myriad architecture’s potential for scientific computing,” Micro, IEEE, vol. 35, no. 1, pp. 6–14, 2015 11

  12. Results: Microcode Statistics 12

  13. Conclusion Introduces a new architecture for fast and efficient matrix-related computations. Defines a process for molding architecture to specific uses via defining state machines in microcode pipeline. Demonstrates an improvement in power efficiency over CPUs/GPUs. Few points for comparison against competing architectures. 13

  14. Discussion 1. Does the amount of overhead (defining state machine) and compiler optimizations still make it better than an ASIC? 2. Is this as generic an architecture as claimed? 3. Are simulation results as useful given a physical chip tape out is there? 14

  15. Thank You 15

Recommend


More recommend