FPGA-based Training Accelerator Utilizing Sparseness of - PowerPoint PPT Presentation

Oct 26, 2022 •265 likes •369 views

FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network Hiroki Nakahara, Youki Sada, Masayuki Shimoda, Akira Jinguji, Shimpei Sato Tokyo Institute of Technology, JP FPL2019 @Barcelona Challenges in DL Training

FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network Hiroki Nakahara, Youki Sada, Masayuki Shimoda, Akira Jinguji, Shimpei Sato Tokyo Institute of Technology, JP FPL2019 @Barcelona
Challenges in DL Training TSUBAME-KFC (TSUBAME Kepler Fluid Cooling) Training with ResNet-50 on ImageNet • High-speed • 1.2 min@2,048 GPUs • Low-power consumption
Sparse Weight Convolution Input feature map Output feature map Kernel (Sparse) y skip skip skip X 0,1 x W 0 X 1,0 x W 1 +) X 2,2 x W 2 𝜍 : Threshold
Training of Sparseness CNN ? • Initial weight • Lottery ticket assumption • Special hardware 001011 01..
Fine-Tuning for a Sparse CNN ・Use pre-trained model (sparse weight) by ImageNet ・ Retain strong connection for recognition accuracy Fine Tuning on FPGA Weak connection Strong connection ρ weak ρ strong Dense CNN
Sparseness vs. Accuracy • 85% of weight can be pruned initially
Universal Convolution (UC) Unit to Bias Base Address (x b ,y b ) Reg Sparse Weight Memory Stack (Buffer for a Feature Map) Reset + x n ReLU 0 1 0: Forward 1: Backward Address Generator 0: Forward 1: Backward (x b +x i ,y b +y i , p i ): Forward Counter 11…1 Idx w2 Non- zero weight Indirect Addres s 1 w1 x 1 ,y 1 ,p 1 2 x 2 ,y 2 ,p 2 ：：：： Address X 00…0 x 1 00…1 x 2 ： (x b -y i ,y b -x i , p i ): Backward
Parallel MCSK Convolution C C M ... Line Buffer (C x N x k) * * * * MCSK Conv. Sparse Filter
Overall Architecture MP Unit Host PC FPGA Bus Memory Bias DDR4 Memory Index Stack GAP Unit Buffer SDRAM Stack Stack LC Unit Stack Weight Memory UC Unit Line . . . . . . . . .
Results VOC2017 8,352 89.2 MobileNetv1 CIFAR-10 1,098 4,458 92.5 VGG16 612 SVHN 2,430 93.3 VGG16 Linnaeus5 1,435 6,121 95.4 VGG16 2,052 MobileNetv1 1,025 88.3 960 URAMs 4,216 BRAMs 6,840 DSPs 2,364K FFs 1,182K LUTs FPGA: VCU1525 2,184 8,944 MobileNetv1 89.8 VOC2017 1,223 4,902 90.1 MobileNetv1 Linnaeus5 2,871 12,058 SVHN 4,178 Resource Consumption 370,299 FPGA [sec] GPU [sec] Sparse Ratio [%] CNN Dataset 1,106 960 3,806 934,381 AlexNet Total DSPs URAMs BRAMs FFs LUTs Module Training Time (Batch size=32, Epoch=100) CIFAR-10 92.1 93.4 1,482 VGG16 CIFAR-10 680 2,697 94.3 AlexNet VOC2017 372 93.7 2,548 AlexNet Linnaeus5 875 3,672 91.0 AlexNet SVHN 615 GPU: RTX2080Ti

Recommend

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray | jan@fpga.org | http://fpga.org CARRV2017: 2017/10/14 FPGA Datacenter Accelerators Are Almost Mainstream Catapult v2. Intel += Altera. OpenPOWER CAPI. AWS

509 views • 28 slides

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Open Source FPGA Toolchain Vincent Gatine Introduction Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine EPITA July 15, 2015 Vincent Gatine (EPITA) Open Source FPGA Toolchain July 15, 2015 1

269 views • 25 slides

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA : An integrated circuit (IC) designed to be configured by customer or designer after manufacturing -- Developed in ~80s -- Widely used (customer

303 views • 7 slides

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

2/21/2012 Content FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers Distributed RAM History FPGA vs ASIC FPGA and Microprocessors Alternatives to FPGAs ETI135, Advanced Digital IC Design

123 views • 10 slides

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs Generic logic cells + Programmable switches Why do we use it? High performance & Flexible Shorter time to market Joachim Rodrigues

311 views • 8 slides

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background and Previous work Transmit and Receive PCIe TLPs DUMP memory FPGA Design Attack vulnerable vanilla Linux system Attack vulnerable UEFI Windows

359 views • 19 slides

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator Output Accelerator Opening 7 SAFETY AND COMFORT 2 3 4 5 1 Wa 9 H 4 H 3 +s Q B 5 B H 6 H 1 b C H 2 +s s Wa F y X L 2 L 1

385 views • 6 slides

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA? Field-programmable gate array (FPGA) are integrated circuits designed to be con fi gured after manufacturing for implementing arbitrary logic

429 views • 15 slides

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

Current Trends in Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real Innovation or Temporary Hype? Stephanie Rupprich Introduction Stephanie Rupprich FPGA Embedded Processors Heidelberg

611 views • 43 slides

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA? FPGA-capella is an audio FX unit Allows users to apply interesting audio FX to their sound Offers flexibility Gives musicians an interesting

379 views • 11 slides

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1.

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1. OpenRTU Project 2. FPGAs and soft processors 3. uClinux 4. RTLinux in Microblaze www.os3sl.com RTLinux in a FPGA 1. OpenRTU Project Spanish

378 views • 33 slides

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders Structural features Recurrence algorithms Weinberger Ling Minimum depth structures Kogge-Stone Ladner-Fischer

419 views • 37 slides

Structural sparsity in the real world Felix Reidl Theoretical Computer Science @abc-Workshop

Structural sparsity in the real world Felix Reidl Theoretical Computer Science @abc-Workshop 2015 Contents The Programme Complex Networks: Examples Network models Structural sparseness Empirical Sparseness The Programme Preface The

779 views • 45 slides

Sparseness in the implicit equation of rational parametric curves and surfaces Ioannis Z.

Sparseness in the implicit equation of rational parametric curves and surfaces Ioannis Z. Emiris 1 and Ilias S. Kotsireas 2 1 Department of Informatics & Telecommunications, University of Athens, Panepistimiopolis, 15771 Greece,

422 views • 4 slides

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson Lab Outline Accelerator Management Changes FY17 Accelerator Operations Fall 2016 Spring 2017 FY18 3+ Hall Operations

363 views • 19 slides

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting) August 29, 2014 SLAC Accelerator and Test Facilities SPEAR3 ASTA FACET FACET II ESTB LCLS I LCLS II SLC NLCTA PEP-II 2 SLAC Accelerator Test

815 views • 18 slides

outline Dark Matter Search with Antimatter Current status and recent results of indirect dark

TA U P 2 0 1 9 S e p 1 0 t h 2 0 1 9 THE GRAMS PROJECT DUAL MEV GAMMA-RAY AND DARK MATTER OBSERVATORY T S U G U O A R A M A K I , S L AC outline Dark Matter Search with Antimatter Current status and recent results of indirect dark matter

246 views • 23 slides

Plan for today Part I: Natural Language Inference Definition and background

Plan for today Part I: Natural Language Inference Definition and background Datasets Models Problems (leading to Part II) Part II: Interpretable NLP Motivation Major approaches Detailed methods

1.24k views • 82 slides

Principles of Software Construction: Objects, Design, and Concurrency Exceptions and contracts in

Principles of Software Construction: Objects, Design, and Concurrency Exceptions and contracts in Java Josh Bloch Charlie Garrod 17-214 1 Administrivia Homework 1 due Today 11:59 p.m. Everyone must read and sign our collaboration

773 views • 60 slides

SAMS Programming A/B Week 4 Lecture Lists July 24, 2017 Mark Stehlik Quiz Lots of

SAMS Programming A/B Week 4 Lecture Lists July 24, 2017 Mark Stehlik Quiz Lots of zeroes Indentation errors Returns inside loops that didn't mean to be Recap Run your code before submission; not at very end!

490 views • 17 slides

Parallel Time-Domain Boundary Element Method for 3-Dimensional Wave Equation Space-Time Methods

Parallel Time-Domain Boundary Element Method for 3-Dimensional Wave Equation Space-Time Methods for PDEs, RICAM Linz, November 10, 2016 D. Luk a s, M. Merta, J. Zapletal, and A. Veit V SBTechnical University of Ostrava, Czech Rep.

987 views • 30 slides

Dispersive approach to hadronic light-by-light: partial-wave contributions Peter Stoffer Physics

Dispersive approach to hadronic light-by-light: partial-wave contributions Peter Stoffer Physics Department, UC San Diego in collaboration with G. Colangelo, M. Hoferichter, and M. Procura JHEP 04 (2017) 161, [arXiv:1702.07347 [hep-ph]] Phys.

367 views • 34 slides

Gravitational Waves from Hidden QCD Phase T ransition by Hiromitsu GOTO Kanazawa

Gravitational Waves from Hidden QCD Phase T ransition by Hiromitsu GOTO Kanazawa University in collaboration with Mayumi AOKI, Jisuke KUBO Based on Phys. Rev. D96 075045, (2017) arXiv:1709.07572 The 21st

396 views • 36 slides

The he Str Stran ange ge Quar Quark-Anti Antiqu quar ark k Asymmetr Asymmetry y of the

The he Str Stran ange ge Quar Quark-Anti Antiqu quar ark k Asymmetr Asymmetry y of the the Nuc Nucleon leon Bo Bo-Qiang Qiang Ma Ma ? Peking University 2017.7.25, Nanjing 9 th Workshop on Hadron Physics in China and

539 views • 43 slides

FPGA-based Training Accelerator Utilizing Sparseness of - PowerPoint PPT Presentation

FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network Hiroki Nakahara, Youki Sada, Masayuki Shimoda, Akira Jinguji, Shimpei Sato Tokyo Institute of Technology, JP FPL2019 @Barcelona Challenges in DL Training

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1.

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders

Structural sparsity in the real world Felix Reidl Theoretical Computer Science @abc-Workshop

Sparseness in the implicit equation of rational parametric curves and surfaces Ioannis Z.

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

outline Dark Matter Search with Antimatter Current status and recent results of indirect dark

Plan for today Part I: Natural Language Inference Definition and background

Principles of Software Construction: Objects, Design, and Concurrency Exceptions and contracts in

SAMS Programming A/B Week 4 Lecture Lists July 24, 2017 Mark Stehlik Quiz Lots of

Parallel Time-Domain Boundary Element Method for 3-Dimensional Wave Equation Space-Time Methods

Dispersive approach to hadronic light-by-light: partial-wave contributions Peter Stoffer Physics

Gravitational Waves from Hidden QCD Phase T ransition by Hiromitsu GOTO Kanazawa

The he Str Stran ange ge Quar Quark-Anti Antiqu quar ark k Asymmetr Asymmetry y of the

Sambuz

Useful Links

Newsletter

Mail Us

FPGA-based Training Accelerator Utilizing Sparseness of - PowerPoint PPT Presentation

FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network Hiroki Nakahara, Youki Sada, Masayuki Shimoda, Akira Jinguji, Shimpei Sato Tokyo Institute of Technology, JP FPL2019 @Barcelona Challenges in DL Training

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1.

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders

Structural sparsity in the real world Felix Reidl Theoretical Computer Science @abc-Workshop

Sparseness in the implicit equation of rational parametric curves and surfaces Ioannis Z.

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&amp;D R. Hettel Accelerator Research Division Head (acting)

outline Dark Matter Search with Antimatter Current status and recent results of indirect dark

Plan for today Part I: Natural Language Inference Definition and background

Principles of Software Construction: Objects, Design, and Concurrency Exceptions and contracts in

SAMS Programming A/B Week 4 Lecture Lists July 24, 2017 Mark Stehlik Quiz Lots of

Parallel Time-Domain Boundary Element Method for 3-Dimensional Wave Equation Space-Time Methods

Dispersive approach to hadronic light-by-light: partial-wave contributions Peter Stoffer Physics

Gravitational Waves from Hidden QCD Phase T ransition by Hiromitsu GOTO Kanazawa

The he Str Stran ange ge Quar Quark-Anti Antiqu quar ark k Asymmetr Asymmetry y of the

Sambuz

Useful Links

Newsletter

Mail Us

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)