Generating Fast Operators for Binarizable Networks Meghan Cowan

Running Binarizable Networks?

Running Binarizable Networks? Training in frameworks with no binarizable operators.

Running Binarizable Networks? ? Speedup Can’t evaluate performance gains Training in frameworks with no binarizable operators.

Running Binarizable Networks? ? Speedup Easy to introduce bugs Can’t evaluate performance gains Training in frameworks with no binarizable operators.

Running Binarizable Networks? ? Speedup Easy to introduce bugs Can’t evaluate performance gains Training in frameworks with no binarizable operators. Need to generate binarizable operators ourselves!

Speedup Baseline Unoptimized Goal Baselines are incredibly well optimized Without optimizations low precision can’t compete

Want operators that are fast Speedup Baseline Unoptimized Goal Baselines are incredibly well optimized Without optimizations low precision can’t compete

Want operators that are fast Speedup Baseline Unoptimized Goal Baselines are incredibly well optimized Need optimized operators for all workloads Performance portability across different CPUs Without optimizations low precision can’t compete

Generating Fast Operators for Binarizable Networks Optimization High-Level Differentiable IR AutoTVM Tensor Expression IR VTA LLVM, CUDA, Metal AutoVTA Edge Cloud Hardware ASIC FPGA FPGA Fleet

Generating Fast Operators for Binarizable Networks Optimization High-Level Differentiable IR AutoTVM Tensor Expression IR Tensor Expression IR VTA LLVM, CUDA, Metal AutoVTA Edge Cloud Hardware ASIC FPGA FPGA Fleet Declare bitserial computation and CPU schedule describing an optimization space

Generating Fast Operators for Binarizable Networks Optimization High-Level Differentiable IR AutoTVM AutoTVM Tensor Expression IR Tensor Expression IR VTA LLVM, CUDA, Metal AutoVTA Edge Cloud Hardware ASIC FPGA FPGA Fleet Declare bitserial computation and CPU schedule describing an optimization space Use AutoTVM use to find schedule parameters for different operators and backends

Generating Fast Operators for Binarizable Networks Optimization tensorize() High-Level Differentiable IR AutoTVM AutoTVM Tensor Expression IR Tensor Expression IR vcnt.8 q8, q8 vrev16.8 q5, q8 vadd.i8 q8, q8, q5 vorr q5, q8, q8 vuzp.8 q8, q5 vmovl.u8 q5, d16 vrev32.16 q5, q5 LLVM, CUDA, Metal VTA LLVM, CUDA, Metal AutoVTA vaddw.u8 q8, q5, d16 vorr q5, q8, q8 vuzp.16 q8, q5 vcnt.8 q8, q8 vrev16.8 q5, q8 vadd.i8 q8, q8, q5 vorr q5, q8, q8 Edge Cloud Hardware vuzp.8 q8, q5 ASIC vmovl.u8 q5, d16 vrev32.16 q5, q5 FPGA FPGA Fleet vaddw.u8 q8, q5, d16 vorr q5, q8, q8 vuzp.16 q8, q5 Declare bitserial computation and CPU schedule Overrule LLVM code generation with custom microkernel describing an optimization space Use tensorize primitive to replace inner-most loop of computation Use AutoTVM use to find schedule parameters for different operators and backends

Convolutions on Raspberry Pi 16-bit TVM W1A1 W1A2 W2A2 30 24 Relative Speedup 18 12 6 0 2 3 4 5 6 7 8 9 10 11 12 Total ResNet 18 Layer Can generate low precision convolutions 5.5x to 15.2x faster than optimized 16-bit integer

Generating Fast Operators for Binarizable Networks Meghan Cowan - PowerPoint PPT Presentation

Generating Fast Operators for Binarizable Networks Meghan Cowan Running Binarizable Networks? Running Binarizable Networks? Training in frameworks with no binarizable operators. Running Binarizable Networks? ? Speedup Cant evaluate

VHDL VHDL - Flaxer Eli Ch 5 - 1 Operators and Attributes Outline Logical Operators

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Tourism in in Tajikistan AS SEEN BY TOUR OPERATORS DUSHANBE, OCTOBER 16, 2019 Tour operators su

Operators of Kolmogorov type and parabolic operators associated with non-commuting vector fields:

Assignment and Arithmetic Operators http://cs.mst.edu What are operators? Operators allow us

More Self-study Operators Unary operators, sizeof, boolean

Generating Subfields Mark van Hoeij June 15, 2017 Mark van Hoeij Generating Subfields Overview

Atikokan Generating Station Thunder Bay Generating Station March 5, 2013 Alberta Biomaterials

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Charity Accounts Questions and Answers Ken Brew Webinar 26 March 2014 Charity Accounts

Introduction to CGC 1 SME Financing/Loan 2 Proposal for MACC 3 4 imSME & MyKNP 2

Darren Hunter Presented by realestate.com.au Who is Darren Hunter? Based in Adelaide SA

Safety and Liveness; Exceptions Christine Rizkallah CSE, UNSW Term 3; 2020 1 Program

Virtual ICH M7 Expert Review Workshop Dr. Sian Ives Sales Manager sian.ives@lhasalimited.org

Automated Program Learning for AGI Moshe Looks madscience@google.com Outline Formulations of

A Dual-user Teleoperation System with Adaptive Authority Adjustment for Haptic Training Fei LIU*,

r r rt r