Supporting TVM on RISC-V Architectures with SIMD Computations - PowerPoint PPT Presentation

Supporting TVM on RISC-V Architectures with SIMD Computations Jenq-Kuen Lee 1 , Chun-Chieh Yang 1 , Allen Lu 2 , P. Chen 1 , YM Chang 1,2 , CH Chang 1 , Yi-Ru Chen 1 , HH Liao 1 , Chao-Lin Lee 1,2 , Ssu-Hsuan Lu 2 , and Shao-Chung Wang 3 1 Department of Computer Science, National Tsing Hua University, Taiwan 2 Peakhills Group Corporation 3 Andes Technology Corporation TVM and Deep Learning Compiler Conference, December 2019

RISC-V with two vector ISAs to support fall-back engine with AI Models Super Word Vector Packed Vector (SubWord SIMD) P Extension V Extension With Fixed-Point and Integer Instructions 0 8 e1 e1 e1 e1 Add 8, OP1 OP1 OP1 OP1 Sub Mul 16, 32, Compare e1 e1 e1 e1 64, Signed 128, Unsigned 256, 512, 1024 bits e1 e1 e1 e1 RISC-V DSP (P) Extension Proposal Chuan-Hua Chang, Andes Technology Corporation Courtesy: Vector ISA, Roger Espasa, Esperanto Technologies TVM and Deep Learning Compiler Conference, December 2019

[RFC] Fixed-point type implementation proposal #4446 • RISC-V P extension (Subword SIMD) with fixed- point instructions. • We refer Fxp as fixed-point value, Fp as floating-point value and PP as point position • Fxp = Fp * pow(2,PP) • Support Fixed-point Type with TVM = 1+ ¼ = 1.25 • Compiler time with type information for the binary point position of the variable. References for Fixed-Point Type (1) AC fixed-Point by Mentor graphics (https://www.mentor.com/hls- lp/downloads/ac-datatypes) (2) Our early proposal to Khronos for OpenCL fixed-point feature set (https://www.khronos.org/assets/uploads/developers/library/2018-khronos- group-opencl-embedded-outreach/Taipei-DSP-Profile-NTHU_Jan18.pdf) TVM and Deep Learning Compiler Conference, December 2019

Auto-FXP with TVM on RISC-V with p Extension • Using machine learning model to auto- tune the binary point position. • It can find the best binary point position for fixed-point expression when we have TVM on RISC-V with p extension. • The work extends AutoTVM and can enhance the accuracy while enjoy the low power numeric benefits. • The tuning work is done with spike simulator incorporated with RISC-V P Fxp16_12 by Default Fxp16_13 by Auto-FXP extension (Subword SIMD).

TVM for RISC-V with V Extension (Superword SIMD) • TVM Optimization • The TVM RISC-V codegen will lower SIMD computation with SIMD intrinsics into LLVM. • The LLVM backend will need to generate the corresponding SIMD instructions. • Need to tune the scheduler to provide a large loop index space for vector parallelism. • LLVM Optimization • VSETVL Redundancy Elimination • VMulADD Resource Utilization • Speedup based on runtime executed instructions Fast Vector Initializer Only TVM Optimization • Spike Simulator • assume 512 bits vector Speedup TVM+ LLVM Optimization 6 register • V SIMD in <4 x float32>, 5 • <8 x float32>, <16 x 4 float32> 3 • 2 Spec v0.7.0, TVM v0.6, 1 LLVM 9.0.0 • Compare with SIMD Densenet AVG Mobilenetv2 Lenet AlexNet float32 and no SIMD inceptionv3 Squeezenet1.0 Resnet18_v1 float32

Summary  Thank you AWS team help with AI model validation flow.  Look forward to contributing codes to the TVM source trees.  More detailed of our work can also be found in the following.  Experiments and AI Model Validations for Neo/TVM on RISC-V Architectures with SIMD, Allen Lu, et al, RISC-V Summit, San Jose, Dec 2019 (Poster).  Enabling TVM on RISC-V Architectures with SIMD Instructions, Allen Lu, Chao-Lin Lee, Yuan-Ming Chang, Piyo Chen, Hsiang-Wei Sung, Heng Lin, Shao-Chung Wang, and Jenq-Kuen Lee, RISC-V Forum, March 2019 (Oral presentation).

Supporting TVM on RISC-V Architectures with SIMD Computations - PowerPoint PPT Presentation

Supporting TVM on RISC-V Architectures with SIMD Computations Jenq-Kuen Lee 1 , Chun-Chieh Yang 1 , Allen Lu 2 , P. Chen 1 , YM Chang 1,2 , CH Chang 1 , Yi-Ru Chen 1 , HH Liao 1 , Chao-Lin Lee 1,2 , Ssu-Hsuan Lu 2 , and Shao-Chung Wang 3 1

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Supporting TVM on RISC-V Architectures Jenq-Kuen Lee 1 , Allen Lu 2 , Yuan-Ming Chang 1,2 ,

SIMD+ Overview Illiac IV History Early machines First massively

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

DLX computer Electronic Computers M 1 RISC architectures RISC vs CISC (Reduced Instruction Set

Automatic SIMD vectorization for Haskell Leaf Petersen, Dominic Orchard , Neal Glew ICFP 2013 -

WaggleVision Pete Beckman, Charlie Catle1, Rajesh Sankaran, Nicola Ferrier, Rob Jacob, Mike

Maximally entangled mixed states with fixed marginals Giuseppe Baio SUPA & University of

FlashTier: A Lightweight, Consistent and Durable Storage Cache 1 Outline Introduction

Disks and RAID (Chapter 12, 14.2) CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy,

DPM Upgrade status Fibre trays Firmware R&D DUNE DAQ Hardware Meeting 24 24 May2018

Verification Futures India 2014 Mike Bartley CEO and Founder Test and Verification Solutions Ltd

CSE291 Convex Optimization (CSE203B Pending) CK Cheng Dept. of Computer Science and Engineering

uFMC25 & BAM_FMC Status Jarosaw Szewiski Samer Bou Habib Follow-up Meeting on MTCA.4

Sambuz

Useful Links

Newsletter

Mail Us

Supporting TVM on RISC-V Architectures with SIMD Computations - PowerPoint PPT Presentation

Supporting TVM on RISC-V Architectures with SIMD Computations Jenq-Kuen Lee 1 , Chun-Chieh Yang 1 , Allen Lu 2 , P. Chen 1 , YM Chang 1,2 , CH Chang 1 , Yi-Ru Chen 1 , HH Liao 1 , Chao-Lin Lee 1,2 , Ssu-Hsuan Lu 2 , and Shao-Chung Wang 3 1

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

VTA: Open &amp; Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Supporting TVM on RISC-V Architectures Jenq-Kuen Lee 1 , Allen Lu 2 , Yuan-Ming Chang 1,2 ,

SIMD+ Overview Illiac IV History Early machines First massively

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

DLX computer Electronic Computers M 1 RISC architectures RISC vs CISC (Reduced Instruction Set

Automatic SIMD vectorization for Haskell Leaf Petersen, Dominic Orchard , Neal Glew ICFP 2013 -

WaggleVision Pete Beckman, Charlie Catle1, Rajesh Sankaran, Nicola Ferrier, Rob Jacob, Mike

Maximally entangled mixed states with fixed marginals Giuseppe Baio SUPA &amp; University of

FlashTier: A Lightweight, Consistent and Durable Storage Cache 1 Outline Introduction

Disks and RAID (Chapter 12, 14.2) CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy,

DPM Upgrade status Fibre trays Firmware R&amp;D DUNE DAQ Hardware Meeting 24 24 May2018

Verification Futures India 2014 Mike Bartley CEO and Founder Test and Verification Solutions Ltd

CSE291 Convex Optimization (CSE203B Pending) CK Cheng Dept. of Computer Science and Engineering

uFMC25 &amp; BAM_FMC Status Jarosaw Szewiski Samer Bou Habib Follow-up Meeting on MTCA.4

Sambuz

Useful Links

Newsletter

Mail Us

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

Maximally entangled mixed states with fixed marginals Giuseppe Baio SUPA & University of

DPM Upgrade status Fibre trays Firmware R&D DUNE DAQ Hardware Meeting 24 24 May2018

uFMC25 & BAM_FMC Status Jarosaw Szewiski Samer Bou Habib Follow-up Meeting on MTCA.4