energy estimation methodology for
play

Energy Estimation Methodology for Accelerator Designs Yannan Nellie - PowerPoint PPT Presentation

Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs Yannan Nellie Wu 1 , Joel S. Emer 1,2 , Vivienne Sze 1 1 MIT 2 NVIDIA 1 Accelergy Overview An architecture-level energy estimator Flexibly


  1. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs Yannan Nellie Wu 1 , Joel S. Emer 1,2 , Vivienne Sze 1 1 MIT 2 NVIDIA 1

  2. Accelergy Overview • An architecture-level energy estimator • Flexibly characterizes various basic building blocks of different technologies • Succinctly models diverse and complicated designs • Improves estimation accuracy via fine-grained classification of operations • Achieves 95% accuracy in evaluating a deep neural network (DNN) accelerator – Eyeriss [ISSCC 2016] 2

  3. Energy Consumption Concerns Data and computation-intensive applications are power hungry Object DNN Detection Accelerator Database Database Processing Accelerator We must quickly evaluate energy efficiency of arbitrary potential designs in the large design space 3

  4. Energy Estimation and Design Exploration PE* component global buffer abstract hierarchy buffer MAC (GLB) processing element Arch. Description 4

  5. Energy Estimation and Design Exploration • Physical-Level Energy Estimator ( Synopsys Prime Power, Cadence Joules) Synthesize the design, Energy place standard cells, and route the wires wire0 NOR3 OR4 OR2 Arch. RTL Physical wire1 Description Model Layout Develop the Requires physical layout of the design register transfer level (RTL) details 5

  6. Energy Estimation and Design Exploration • Physical-Level Energy Estimator ( Synopsys Prime Power, Cadence Joules) Energy Arch. RTL Physical Fabricated Description Model Layout Chip Requires physical layout of the design Slow design space exploration 6

  7. Accelergy Overview • An architecture-level energy estimator • Flexibly characterizes various basic building blocks in the design • Succinctly models diverse and complicated designs • Improves estimation accuracy via fine-grained classification of operations • Achieves 95% accuracy in evaluating a deep neural network (DNN) accelerator – Eyeriss [ISSCC 2016] 7

  8. Energy Estimation and Design Exploration • Architecture-Level Energy Estimators Energy RTL Physical Fabricated Arch Model Layout Chip Description PE* global buffer Only requires architecture-level design buffer MAC Fast design space exploration (GLB) processing element 8

  9. Existing Architecture-Level Energy Estimators • Design-Specific Accelerator Estimators: Aladdin [ISCA2014] , fixed-cost [Asilomar2017] Architecture Description PE buffer Energy GLB MAC Estimator Description with primitive components (basic building blocks) 9

  10. Existing Architecture-Level Energy Estimators • Design-Specific Accelerator Estimators: Aladdin [ISCA2014] , fixed-cost [Asilomar2017] Energy Estimator Architecture Description PE Energy Reference Table (ERT) Comp. Action Energy buffer GLB access() 100pJ GLB buffer access() 10pJ MAC MAC compute() 5pJ 10

  11. Existing Architecture-Level Energy Estimators • Design-Specific Accelerator Estimators: Aladdin [ISCA2014] , fixed-cost [Asilomar2017] Energy Estimator Architecture Description PE Energy Reference Table (ERT) Comp. Action Energy buffer GLB access() 100pJ GLB buffer access() 10pJ MAC MAC compute() 5pJ Comp. Action Counts GLB access() 10 buffer access() 800 MAC compute() 400 Action Counts 11

  12. Existing Architecture-Level Energy Estimators • Design-Specific Accelerator Estimators: Aladdin [ISCA2014] , fixed-cost [Asilomar2017] Energy Estimator Architecture Description PE Energy Reference Table (ERT) Comp. Action Energy buffer GLB access() 100pJ GLB buffer access() 10pJ MAC Energy Estimations MAC compute() 5pJ Name Energy Comp. Action Counts GLB 1000pJ Energy Calculator GLB access() 10 buffer 8000pJ buffer access() 800 MAC 2000pJ MAC compute() 400 Action Counts 12

  13. Existing Architecture-Level Energy Estimators • Design-Specific Accelerator Estimators: Aladdin [ISCA2014] , fixed-cost [Asilomar2017] Energy Estimator Architecture Description PE Energy Reference Table (ERT) Comp. Action Energy buffer GLB access() 100pJ GLB GLB ’ buffer access() 10pJ MAC MAC compute() 5pJ Comp. Action Counts Energy Calculator GLB access() 10 GLB ’ buffer access() 800 MAC compute() 400 Not generalizable to other designs Action Counts 13

  14. Accelergy Overview • An architecture-level energy estimator • Flexibly characterizes various primitive components of different technologies • Succinctly models diverse and complicated designs • Improves estimation accuracy via fine-grained classification of operations • Achieves 95% accuracy in evaluating a deep neural network (DNN) accelerator – Eyeriss [ISSCC 2016] 14

  15. Accelergy: Flexibly Model Various Primitive Components Accelergy Architecture Description PE ERT buffer Generator GLB SRAM MAC Primitive SRAM type has Component associated action Library “ access ” CACTI 40nm Estimation Estimation … Plug-in Plug-in 15

  16. Accelergy: Flexibly Model Various Primitive Components Accelergy ERT (in progress) Architecture Description Comp. Action Energy PE ERT GLB access() 100pJ buffer Generator GLB SRAM MAC Primitive SRAM type has Component associated action Library “ access ” CACTI 40nm Estimation Estimation … Plug-in Plug-in 16

  17. Accelergy: Flexibly Model Various Primitive Components Accelergy ERT Architecture Description Comp. Action Energy PE ERT GLB access() 100pJ buffer Generator buffer access() 10pJ GLB MAC compute() 5pJ MAC Primitive Component Library CACTI 40nm Estimation Estimation … Plug-in Plug-in 17

  18. Accelergy: Flexibly Model Various Primitive Components Accelergy ERT Architecture Description PE ERT buffer Generator GLB MAC Primitive Energy Component Calculator Library CACTI 40nm Estimation Estimation … Plug-in Plug-in 18

  19. Accelergy: Flexibly Model Various Primitive Components Accelergy ERT Architecture Description PE ERT buffer Generator Action Counts GLB Comp. Action Counts MAC Primitive Energy GLB access() 10 Component Calculator buffer access() 800 Library MAC compute() 400 Name Energy CACTI 40nm GLB 1000pJ Estimation Estimation … Energy Buffer 8000pJ Estimates Plug-in Plug-in MAC 2000pJ 19

  20. Accelergy: Flexibly Model Various Primitive Components Use energy estimation plug-ins to characterize primitive components CACTI Estimation Plug-in 40nm NVSIM Estimation [TCAD 2012] Plug-in Proprietary plug-ins Emerging technology Traditional open-source plug-ins plug-ins* *available at http://accelergy.mit.edu 20

  21. Accelergy: Flexibly Model Various Primitive Components Use energy estimation plug-ins to characterize primitive components CACTI Estimation Plug-in 40nm NVSIM Estimation Detailed plug-in interface in open-source repo [TCAD 2012] Plug-in Proprietary plug-ins Emerging technology Traditional open-source plug-ins plug-ins* *available at http://accelergy.mit.edu 21

  22. Modeling Complicated Designs • Practical architecture designs involve much more details – Example: storage units with local address generators (AGs) AG_SRAM read address AG[0] o AG_SRAM is an abstract hierarchy data data o Buffer is of SRAM type counter in out o AGs is of counter type buffer SRAM AG[1] write address counter 22

  23. Modeling Complicated Designs • Practical architecture designs involve much more details – Example: storage units with local address generators (AG) Let ’ s construct a more practical design! GLB PE buffer Buffer GLB MAC MAC 23

  24. Modeling Complicated Designs • Practical architecture designs involve much more details – Example: storage units with local address generators (AG) Let ’ s construct a more practical design! GLB PE buffer MAC 24

  25. Modeling Complicated Designs Accelergy Architecture Description • Action counts are even more tedious ERT • Small modification requires Generator new action counts • Architecture Name Action Counts Primitive Energy description is tedious Component PE[0]. count 50 • Hard to make Calculator AG[0] Library modifications PE[0]. count 50 AG[1] … CACTI 40nm Action Counts Estimation Estimation … Plug-in Plug-in 25

  26. Existing Work - Modeling Complicated Designs • Existing work that aims to succinctly model complicated architectures – Wattch [ISCA2000] , McPAT [MICRO2009] CPU-Centric Architecture Model ROB ALU L1 $ L2 $ … Components that can Use a fixed set of compound components to be decomposed into represent the architecture lower level components 26

  27. Existing Work - Modeling Complicated Designs • Existing work that aims to succinctly model complicated architectures – Wattch [ISCA2000] , McPAT [MICRO2009] CPU-Centric Architecture Model ROB ALU L1 $ L2 $ … The fixed set of compound components is not sufficient to describe arbitrary accelerator designs 27

  28. Accelergy Overview • An architecture-level energy estimator • Flexibly characterizes various primitive components of different technologies • Succinctly models diverse and complicated designs • Improves estimation accuracy via fine-grained classification of operations • Achieves 95% accuracy in evaluating a deep neural network (DNN) accelerator – Eyeriss [ISSCC 2016] 28

Recommend


More recommend