approach in ml architecture
play

Approach in ML Architecture" Professor Uri Weiser Viterbi - PowerPoint PPT Presentation

"The Next Challenge: Energy Efficient Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering Uri Weiser The Technion Israel UPC October 10 th 2018 July 1 st 2019 Contributors to the research:


  1. "The Next Challenge: Energy Efficient Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering Uri Weiser The Technion Israel UPC October 10 th 2018 July 1 st 2019 Contributors to the research: Leeor Peled, Daniel Raskin, Gil Shomron, Leonid Yavits, Moran Shkolnik, Avi Baum, 1 The presentation is based on work by: Gil Shomron, Daniel Raskin,, Loren Jammal, Avi Baum, Yoav Etsion

  2. To Yale 5 years passed since Yale@75, OK but why do you have to drag us with you? Interesting how you keep staying in the center … 2

  3. Beauty comes shining through not only when blooming 3

  4. Agenda: • Technology environment • Process is slowing down • Big Data • Funnel • Killer apps ➔ ML • Efficient ML BASICS • Energy: • Amdahl and MA (divide effectively our limited resources) • SMT – is this a biggy? • Pipeline – why? • Map applications to HW – Data Flow concept • Prediction – no validation is necessary • Conclusions 4 4

  5. Technology environment Performance History Relative Performance uArch impact 20X Total impact 2,000X Process impact 100X Process Feature size [um] We (the architects) did an “ OK- ” job 5 5 *ACMqueue April 6, 2012, Processors, Volume 10, issue 4 CPU DB: Recording Microprocessor History, Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz, Stanford University

  6. Big Data ➔ usage of DATA Input: Unstructured data Funnel BW out beta= BW in ➔ Extract Transformed Load ➔ Read Once ➔ Non-Temporal Memory Access 6

  7. Killer applications * • ML is one!? • Funnel (in most of the cases) • Input: huge amount of data Output: small amount • Many simple operations *Applications you can not effectively execute on current HW (Dr. Andy Grove) 7

  8. Energy in “ Data Flow ” architecture Instruction energy breakdown I-Cache Access Control Reg. File OP. Access 25pJ 6pJ 45pJ 0.5pJ Data access “ Data flow ” energy breakdown Now Read Once counts! 8

  9. Efficient ML I Accelerator • Energy ➔ Performance • Map applications to HW ➔ Graph mapping; data flow • Efficient mapping • Co-design HW structure and smart compiler in specific application environment • Almost no flow control • Statistical results – no need to validate execution 9

  10. Efficient ML II Balanced design and energy reduction • Energy ➔ Performance • System vs. Accelerators : It is Amdahl again! • Energy reduction • Reduction in Computing (MACs op.) • Pruning • Prediction • Reduction in data access and movement • Pipeline • Efficient usage of the Hardware resources • Multi-Amdahl (divide effectively your limited resources) • SMT 10

  11. Efficient ML II: Reduction in Computing 0.01pJ/OP Energy efficiency 0.1pJ/OP 1 TOPS/W drop due to inefficiency 2 (e.g. data movement, DRAM repeated accesses … ) Throughput Energy efficiency α energy/OP 11 ISSCC Feb 17 th 2019 preen announcement

  12. Efficient ML II: Reduction in Computing (1) • Reduction in Computing ➔ reduce # of operations via • Pruning • Well known techniques • Value Data (Prediction) • ML are statistical ➔ no need to validate execution G. Shomron, U. Weiser, “ Spatial Correlation and Value Prediction in Convolutional Neural Networks ” IEEE Computer Architecture Letters (CAL) Journal January 2019 12

  13. Efficient ML II: Reduction in Data Accesses (2) • Reduction in Data access and movements • Pipeline execution ➔ Data stays on die Memory (DRAM) Memory (SRAM) I R IN Out = IN MAC MAC MAC MAC Layer 1 Layer 2 Layer 3 Layer n 13

  14. Efficient ML II: Efficient usage of HW • Multi-Amdahl* (divide effectively your limited resources) t 1 t 2 t 3 t n F 1 (a 1 ) F 2 (a 2 ) F n (a n ) Optimization using Lagrange multipliers Target under a constraint A F ’ = derivation of the accelerator Function t i F i ’ (a i ) = t j F j ’ (a j ) e.g. efficient resource division (e.g. SRAM)* • SMT** • Resources needs are known ahead of time … *T. Zidenberg, Isaac Keslassy, U. Weiser, “ Optimal Resource Allocation with MultiAmdahl ” IEEE MICRO Journal August 2013 ** Technion EE, Advanced Microarchitecture course ’ s Exam (winter 2019) ** *G. Shomron, T. Horowitz, U. Weiser, “ SMT-SA: Simultaneous Multithreading in Systolic Arrays ” IEEE Computer Architecture Letters (CAL) Journal July 2019 14

  15. Conclusions • Opportunities: Map application to HW • • Reduce energy per operation? • Reduce # of operations • Reduce data movement and memory access • Efficient usage of HW We ’ re gonna have fun • • Open field, lots of ideas, many researchers Opportunities • New passionate energy in the community • Back to the “ big impact ” era … • 15

  16. Thank You 16

Recommend


More recommend