ASIC accelerators 1 To read more This days papers: Reagan et al, - PowerPoint PPT Presentation

ASIC accelerators 1

To read more… This day’s papers: Reagan et al, “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators” Shao et al, “The Aladdin Approach to Accelerator Design and Modeling” (Computer magazine version) Supplementary reading: Han et al, “EIE: Efficient Inference Engine on Compressed Neural Networks” Shao et al, “Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures” 1

A Note on Quoting Papers I didn’t look closely enough at paper reviews earlier in the semester Some paper reviews copying phrases from papers have good habits Usually — better ofg rewriting completely even if your grammar is poor Consistent style — easier to read 2 You must make it obvious you are doing so This will get you in tons of trouble later if you don’t

Homework 3 Questions? Part 1 — due tomorrow, 11:59PM Part 2 — serial codes out 3

Accelerator motivation end of transistor scaling specialization as way to further improve performance especially performance per watt key challenge: how do we design/test custom chips quickly? 4

Behavioral High-Level Synthesis take C-like code, produce HW problem (according to Aladdin paper): requires lots of tuning… to handle/eliminate dependencies to make memory accesses/etc. efficient 5

Data Flow Graphs int sum_ab = a + b; int sum_cd = c + d; int result = sum_ab + sum_cd; a + b c + d + result 6

DFG scheduling a result + + + d c b result two add functional units: + + + d c b a one add functional unit: 7

DFG realization — data path MUX MUX a c b d ADD ADD sum_ab sum_cd result plus control logic selectors for MUXes, write enable for regs 8

Dynamic DDG Aladdin trick: assume someone will fjgure out scheduling HW full synthesis: actually need to make working control logic need to fjgure out memory/register connections 9 use dynamic (runtime) dependencies

Dynamic Data Dependency Graph 10

full synthesis: tuning 11

tuning: false dependencies “the reason is that when striding over a partitioned array being read from and written to in the same cycle, though accessing difgerent elements of the loop-carried dependences.” 12 array, the HLS compiler conservatively adds

Aladdin area/power modeling functional unit power/area + memory power/area library of functional units tested via microbenchmarks memory model select latency, number of ports (read/write units) 13

Missing area/power modeling control logic accounting wire lengths, etc., etc. 14

Pareto-optimum Pareto-optimum: can’t make anything better without making something worse 15

design space example (GEMM) 16

Neural Networks (1) out 17 I 4 a 4 b 3 a 3 I 3 c 1 b 2 a 2 I 2 b 1 a 1 I 1 real world: out real = F ( I 1 , I 2 , I 3 , I 4 ) compute approximation out pred ≈ ˆ F ( I 1 , I 2 , I 3 , I 4 ) using intermediate values a i s, b i s

Neural Networks (2) out 18 I 4 a 4 b 3 a 3 I 3 c 1 b 2 a 2 I 2 b 1 a 1 I 1 a 1 = K ( w a 1 , 1 I 1 + w a 1 , 2 I 2 + · · · + w a 1 , 4 I 4 ) b 1 = K ( w b 1 , 1 a 1 + w b 1 , 2 a 2 + w b 1 , 3 a 3 ) w s — weights, selected by training

Neural Networks (3) difgerentiable 19 neuron: a 1 = K ( w a 1 , 1 I 1 + w a 1 , 2 I 2 + · · · + w a 1 , 4 I 4 ) 1 K ( x ) — activation function, e.g. 1 + e − x close to 0 as x approaches −∞ close to 1 as x approaches + ∞

Minerva’s problem evaluating neural networks train model once, deploy in portable devices example: handwriting recognizer 20 goal: low-power, low-cost ( ≈ area) ASIC

High-level design 21

Tradeofgs mathematical — design of neural network hardware — size of memory, number of calculations mathematical — precision of calculations hardware — size of memory, number of calculations hardware — amount of inter-neuron parallelism approx. cores hardware — amount of intra-neuron parallelism i.e. pipeline depth 22

Neural network parameters 23

“intrinsic inaccuracy” 24

intrinsic inaccuracy assumption don’t care if precision variation similar to training variation sensible? 25

HW tradeofgs (1) 26

HW tradeofgs (1) 27

parameters varied functional unit placement (in in pipeline) number of lanes 28

HW pipeline 29

Decreasing precision (1) from another neural network ASIC accelerator paper: 30

Decreasing precision (2) from another neural network ASIC accelerator paper: 31

Pruning short-circuit calculations close to zero statically — remove neurons with almost all zero weights dynamically – compute 0 if input is near-zero without checking weights 32

SRAM danger zone 33

Traditional reliability techniques don’t run at low voltage/etc. redundancy — error correcting codes 34

Algorithmic fault handling calculations are approximate anyways “noise” from imprecise training data, rounding, etc. physical faults can just be more noise 35

round-down on faults 36

design exploration huge number of variations: amount of parallel computations width of computations/storage size of models best power per accuracy 37

note: other papers on this topic EIE — same conference omitted zero weights in more compact way noted: lots of tricky branching on GPUs/CPUs. solved general sparse matrix-vector multiply problem 38

design tradeofgs in the huge next time: Warehouse-Scale Computers AKA datacenters — most common modern supercomputer no paper review reading on schedule: Barroso et al, The Datacenter as a Computer, chapters 1 and 3 and 6 39

next week — security general areas of HW security: protect programs from each other — page tables, kernel mode, etc. protect programs from adversaries — bounds checking, etc. protect programs from people manipulating the hardware next week’s paper: last category 40

ASIC accelerators 1 To read more This days papers: Reagan et al, - PowerPoint PPT Presentation

ASIC accelerators 1 To read more This days papers: Reagan et al, Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators Shao et al, The Aladdin Approach to Accelerator Design and Modeling (Computer

Application Accelerators: Application Accelerators: Application Accelerators: Application

ASIC Computer-Aided Design Flow ELEC 5250/6250 ASIC Design Flow ASIC Design Flow Behavioral

Coercive Powers & ASIC Coercive Powers & ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Measurements on P2 and P3 FE ASIC and Experience of P2 FE ASIC in ProtoDUNE-SP Shanshan Gao on

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&D

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

SIPHRA Silicon Photomultiplier Readout ASIC Prototype ASIC for SiPM Based Gamma-Ray

ECE 5745 Complex Digital ASIC Design Section 1: ASIC Flow Front-End Christopher Batten School of

ASIC Physical Design Post-Layout Verification ASIC Physical Design (Standard Cell) (can also do

Traditional Netlist SignOff Model ASIC Vendors ASIC Customers [Front End] [Back End] Functional

Modeling Adult Visual Function Dr. James A. Bednar jbednar@inf.ed.ac.uk

Central Representation of Touch John H. Martin, Ph.D. Center for Neurobiology & Behavior

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Internet technology research Internet technology research with International collaboration with

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang

Theoretical neuroscience: From single neuron to network dynamics Nicolas Brunel Outline

Logic, language and the brain: how autists reason with rules and exceptions Michiel van

Patient and Physician Reported Outcomes Karl Swedberg Senior professor of Medicine University

ASIC accelerators 1 To read more This days papers: Reagan et al, - PowerPoint PPT Presentation

ASIC accelerators 1 To read more This days papers: Reagan et al, Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators Shao et al, The Aladdin Approach to Accelerator Design and Modeling (Computer

Application Accelerators: Application Accelerators: Application Accelerators: Application

ASIC Computer-Aided Design Flow ELEC 5250/6250 ASIC Design Flow ASIC Design Flow Behavioral

Coercive Powers &amp; ASIC Coercive Powers &amp; ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Measurements on P2 and P3 FE ASIC and Experience of P2 FE ASIC in ProtoDUNE-SP Shanshan Gao on

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&amp;D

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

SIPHRA Silicon Photomultiplier Readout ASIC Prototype ASIC for SiPM Based Gamma-Ray

ECE 5745 Complex Digital ASIC Design Section 1: ASIC Flow Front-End Christopher Batten School of

ASIC Physical Design Post-Layout Verification ASIC Physical Design (Standard Cell) (can also do

Traditional Netlist SignOff Model ASIC Vendors ASIC Customers [Front End] [Back End] Functional

Modeling Adult Visual Function Dr. James A. Bednar jbednar@inf.ed.ac.uk

Central Representation of Touch John H. Martin, Ph.D. Center for Neurobiology &amp; Behavior

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Internet technology research Internet technology research with International collaboration with

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang

Theoretical neuroscience: From single neuron to network dynamics Nicolas Brunel Outline

Logic, language and the brain: how autists reason with rules and exceptions Michiel van

Patient and Physician Reported Outcomes Karl Swedberg Senior professor of Medicine University

Coercive Powers & ASIC Coercive Powers & ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&D

Central Representation of Touch John H. Martin, Ph.D. Center for Neurobiology & Behavior