DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, - PowerPoint PPT Presentation

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang, Wei-Chung Hsu & Juin-Ming Lu National Tsing Hua University & Industrial Technology Research Institute Dec 5th, 2019 1

DRAM Access Consumes More Energy • Energy efficiency is the key to DNN computation • Hardware accelerators • DRAM consumes 50-100x more energy per byte than SRAM • Node fusion is used to save DRAM accesses DRAM SRAM Register Energy 250x 4x 1x 2

TVM only Fuses Elementwise OP BatchNorm Elementwise TopLevel Relu Conv TVMOP Elementwise OutElementwieFusable • Currently, TVM only supports fusion of elementwise OP into Conv • Each OP has an attribute to indicate whether to fuse • Generate TVMOP, which includes nodes to share data in SRAM 3

Our Node Fusion Merges Multiple Convs Fusion Fus Tensor data Te 1 st 2 nd 1 st 2 nd DNN DRAM DRAM DRAM DRAM DRAM Operator SRAM for ( n = 0 ; n < N ; n ++) for ( n = 0 ; n < N ; n ++) # 1st Conv for ( k = 0 ; k < C2 ; k ++) for ( k = 0 ; k < C1 ; k ++) for ( y = 0 ; y < H2 ; y ++) for ( y = 0 ; y < H1 ; y ++) for ( x = 0 ; x < W2 ; x ++) for ( x = 0 ; x < W1 ; x ++) # Internal SRAM buffer int sram [ C1 ][ R2 ][ S2 ] for ( c = 0 ; c < C0 ; c ++) for ( r = 0 ; r < R1 ; r ++) for ( c = 0 ; c < C1 ; c1 ++) for ( s = 0 ; s < S1 ; s ++) for ( r = 0 ; r < R2 ; r ++) O1 [ n ][ k ][ y ][ x ] += W1 [ k ][ c ][ r ][ s ] * I [ n ][ c ][ y + r ][ x + s ] for ( s = 0 ; s < S2 ; s ++) for ( c2 = 0 ; c2 < C0 ; c ++) for ( n = 0 ; n < N ; n ++) # 2nd Conv for ( r2 = 0 ; r2 < R1 ; r ++) for ( k = 0 ; k < C2 ; k ++) for ( s2 = 0 ; s2 < S1 ; s ++) for ( y = 0 ; y < H2 ; y ++) sram [ c ][ r ][ s ] += W1 [ c ][ c2 ][ r2 ][ s2 ] * I [ n ][ c2 ][ y + r + r2 ][ x + s + s2 ] for ( x = 0 ; x < W2 ; x ++) for ( c = 0 ; c < C1 ; c ++) for ( c = 0 ; c < C1 ; c ++) for ( r = 0 ; r < R2 ; r ++) for ( r = 0 ; r < R2 ; r ++) for ( s = 0 ; s < S2 ; s ++) for ( s = 0 ; s < S2 ; s ++) O2 [ n ][ k ][ y ][ x ] += W2 [ k ][ c ][ r ][ s ] * O1 [ n ][ c ][ y + r ][ x + s ] O [ n ][ k ][ y ][ x ] += W2 [ k ][ c ][ r ][ s ] * sram [ c ][ r ][ s ] 4

Experiment Settings: Hardware Controller • Eyeriss-like architecture ifmap • 256MB DRAM PE PE PE ... PE weights • 108KB SRAM ipsum Buffer PE PE PE ... PE • 12x14 PE ... opsum ... ... ... • Runs AlexNet PE PE PE ... PE • Due to hardware limitation, only Conv is DRAM evaluated 5

Experimental Results Energy (mJ) MCycle Energy-Delay (KCycle.J) 5 7 35 4.5 16% 6 30 23% 4 5 40% 3.5 25 3 4 20 2.5 3 15 2 1.5 10 2 1 5 1 0.5 0 0 0 Engergy*Cycle Energy Cycle w/o Fusion Fusion w/o Fusion Fusion w/o Fusion Fusion 6

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, - PowerPoint PPT Presentation

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang, Wei-Chung Hsu & Juin-Ming Lu National Tsing Hua University & Industrial Technology Research Institute Dec 5th, 2019 1 DRAM Access Consumes

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

MA CHIA DATA COLLECTION & SHARING PRACTICES Kathy Hines Senior Director of Partner

The Massachusetts Health Connector Massachusetts Health Policy Forum 2019 Student Forum MARISS

ANALYSIS Ray Campbell January 10, 2019 CENTER FOR HEALTH INFORMATION AND ANALYSIS CHIAs

CHIA OVERVIEW Ray Campbell, Executive Director April 5, 2018 CENTER FOR HEALTH INFORMATION AND

The independent validation of your health informatics and digital health skills

Verifiable Delay Functions: How to Slow Things Down (Verifiably) Dan Boneh Stanford University

Introduction to hardware design of block ciphers Francesco Regazzoni Francesco Regazzoni 20

Welcome to CSE 506 Introduc)on & Review Don Porter 1 CSE 506: Opera.ng Systems Why Grad

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, - PowerPoint PPT Presentation

DRAM Access Reduction by Node Fusion with TVM Chia-Wei Chang, Jing-Jia Liou, Chih-Tsun Huang, Wei-Chung Hsu & Juin-Ming Lu National Tsing Hua University & Industrial Technology Research Institute Dec 5th, 2019 1 DRAM Access Consumes

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

TVM at Facebook Lots of contributors at FB and elsewhere TVM at Facebook Why TVM? Examples from

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

VTA: Open &amp; Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

December 12, 2018 Luis Ceze Welcome to the 1st TVM and Deep Learning Compilation Conference!

TVM TVM f for ed or edge c e com omputin ting p g pla latf tform orm NTT Software Inno

TVM Deep Learning on Bare-Metal Devices Pratyush Patel No OS stack Extend TVM to support

TVM @ FB Andrew Tulloch Research Scientist Background Excited to be here! Lots of FB

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node-&gt;m_data == value) {

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

MA CHIA DATA COLLECTION &amp; SHARING PRACTICES Kathy Hines Senior Director of Partner

The Massachusetts Health Connector Massachusetts Health Policy Forum 2019 Student Forum MARISS

ANALYSIS Ray Campbell January 10, 2019 CENTER FOR HEALTH INFORMATION AND ANALYSIS CHIAs

CHIA OVERVIEW Ray Campbell, Executive Director April 5, 2018 CENTER FOR HEALTH INFORMATION AND

The independent validation of your health informatics and digital health skills

Verifiable Delay Functions: How to Slow Things Down (Verifiably) Dan Boneh Stanford University

Introduction to hardware design of block ciphers Francesco Regazzoni Francesco Regazzoni 20

Welcome to CSE 506 Introduc)on &amp; Review Don Porter 1 CSE 506: Opera.ng Systems Why Grad

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {

MA CHIA DATA COLLECTION & SHARING PRACTICES Kathy Hines Senior Director of Partner

Welcome to CSE 506 Introduc)on & Review Don Porter 1 CSE 506: Opera.ng Systems Why Grad