The Embedded Learning Library The Embedded Learning Library (ELL) - PDF document

The Embedded Learning Library

The Embedded Learning Library (ELL) Cross-compiler for AI pipelines, specialized for resource constrained target platforms T arget ELL AI Machine Pipeline Code https://github.com/Microsoft/ELL

The Embedded Learning Library • 3 years at Microsoft Research • compiler toolchain, tutorials, model gallery • focus: ARM CPUs  embedded GPUs, vision on ARM Cortex A53, keyword spotting on ARM Cortex M4f

Architecture Pretrained Dataset Model Importers ELL Trainers Importer Importer Importer Importer T arget Computation Graph Optimizer Importer Profiles Importer ELL Platform Abstraction Layer LLVM OpenCL … T arget Emitter Emitter LLVM OpenCL BLAS

AI compiler vs. AI runtime why AI compiler? why AI runtime? • model-specific optimization • portability • target-specific optimization • seamless migration from cloud to edge • small executable best of both worlds just-in-time AI compiler

Evaluation small loss in accuracy  large gain in cost compression techniques: • efficient architectures • pruning • low precision math and quantization • low rank matrix approximation

January 2018 Architecture search 70 65 Pareto frontier model ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

January 2018 Architecture search 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

February 2018 Architecture search 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

March 2018 Architecture search 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

April 2018 Architecture search 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

Lossless acceleration • variety of convolution kernels • scheduling • engineering

January 2019 Lossless acceleration 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

February 2019 Lossless acceleration 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

March 2019 Lossless acceleration . 70 65 ILSVRC2012 top-1 60 55 50 45 40 35 30 0 100 200 300 400 500 600 700 800 900 1000 ms/image on RPi3@700MHz

Lossy Acceleration mix and match compression techniques engineering/ML co-design during training vs post processing

Quantization semantics binary lookup/clustered iterative sum bit bit value bit bit bits value bits Value value 0…k lookup 0 0 0…k a ± b ± c ± .. ± n 0 -1 1 1 1 1 linear ternary bits value bits value 0…k [0...2^k - 1] 00 0 exponential 01 1 bits value 10 n/a 11 -1 0…k [-2^(b-1)-1...2^(b-1)-1]

Quantization representation b3 b2 b1 b0 a3 a2 a1 a0 d3 d2 d1 d0 c3 c2 c1 c0 bit packed d0 c0 b0 a0 d1 c1 b1 a1 d2 c2 b2 a2 d3 c3 b3 a3 bit planes

Quantization example ternary weights, 3-bit unsigned linear activations (bitplane) activations 5 1 7 6 3 4 2 5 weights 1 -1 0 -1 -1 -1 1 0 dot = 5*1 + 1*-1 + 7*0 + 6*-1 + 3*-1 + 4*-1 + 2*1 + 5*0 = -7

Quantization example 1 1 1 0 1 0 0 1 activations 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 5 1 7 6 3 4 2 5 sign 0 1 0 1 1 1 0 0 magnitude 1 1 0 1 1 1 1 0 1 -1 0 -1 -1 -1 1 0

Quantization example 1 1 1 0 1 0 0 1 activations 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 sign 0 1 0 1 1 1 0 0 magnitude 1 1 0 1 1 1 1 0

Quantization example 1 1 1 0 1 0 0 1 activations 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 sign 0 1 0 1 1 1 0 0 magnitude 1 1 0 1 1 1 1 0 absSum: o = a && m o = 11101001 && 11011110 = 11001000 absSum += popcount(o) absSum += popcount(o) = 3 negSum: o = a && s o = 1100100 && 01011100 = 10000100 negSum += popcount(o) negSum += popcount(o) = 2

Quantization example 1 1 1 0 1 0 0 1 activations 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 sign 0 1 0 1 1 1 0 0 magnitude 1 1 0 1 1 1 1 0 absSum: o = a && m o = 00111010 && 11011110 = 00011010 absSum += popcount(o) << 1 absSum += popcount(o) = 3 + 2*3 = 9 negSum: o = a && s o = 00011010 && 01011100 = 00011000 negSum += popcount(o) << 1 negSum += popcount(o) = 2 + 2*2 = 6

Quantization example 1 1 1 0 1 0 0 1 activations 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 sign 0 1 0 1 1 1 0 0 magnitude 1 1 0 1 1 1 1 0 absSum: o = a && m o = 10110101 && 11011110 = 11001000 absSum += popcount(o) << 2 absSum += popcount(o) = 9 + 4 * 3 = 21 negSum: o = a && s o = 11001000 && 01011100 = 01001000 negSum += popcount(o) << 2 negSum += popcount(o) = 6 + 4 * 2 = 14 total = 21 – 2 * 14 = -7 total = absSum – 2 * negSum

Quantization example 1 1 1 0 1 0 0 1 activations 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 sign 0 1 0 1 1 1 0 0 magnitude 1 1 0 1 1 1 1 0 instruction_count = 8 instructions * 3 bits = 24 instructions vector size = 8 instructions per element = 24 / 8 = 3 if word is 128-bit (NEON): instruction_count = 8 instructions * 3 bits + 0.3 reduce ops = 24.3 instructions vector size = 128 instructions per element = 24.3 / 128 = 0.19 (5x faster than float)

Quantization performance Speedup on ARM1176 25 quantized vs full precision 20 15 10 5 0 1 Bit 2 Bits 3 bits 8 bits

Quantized weight accuracy 1 accuracy vs original model 0.9 0.8 0.7 0.6 0.5 model with models with binary weights 0.4 trinarized 0.3 weights 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 proportion of zeros in ternary weights

Quantized activation accuracy 1 accuracy vs real activations 0.9 0.8 0.7 0.6 0.5 0.4 ternary weights binary weights 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 quantized activation bit count

Current focus areas • post-training lossy compression (pruning and quantization) • engineering/ML training co-design • infrastructure: beating BLAS on embedded platforms extending platform abstraction layer to embedded GPUs global optimizer

Questions? • https://microsoft.github.io/ELL/ • Code: https://github.com/Microsoft/ELL • Model Gallery: https://microsoft.github.io/ELL/gallery/

Not every model is a winner

The Embedded Learning Library The Embedded Learning Library (ELL) - PDF document

The Embedded Learning Library The Embedded Learning Library (ELL) Cross-compiler for AI pipelines, specialized for resource constrained target platforms T arget ELL AI Machine Pipeline Code https://github.com/Microsoft/ELL The Embedded

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

The Chester Community Library The Chester Community Library The Chester Community Library The

Wolfner Talking Book and Braille Library That All May Read Wolfner Library About Wolfner

Library Services in Guatemala and Utah: Applying lessons learned abroad in library outreach and

Standard Cell Library/Library Exchange Format (LEF) Advanced VLSI Design CMPE 641 Library

Innovation with Microsoft PPM Summit Boston October 24 th , 2019 365 Why Project Online and

Tree Automata Geetam Chawla Stanly Samuel ATC Seminar 2018 Intro DTA Pumping Lemma

Compiler Construction Lecture 6: Syntax Analysis II ( LL ( k ) Grammars) Thomas Noll Lehrstuhl

NIRS Annual Retirement Policy Conference Presentation: Commission on Retirement Security and

Maine NTA Processes and Policy Vermont System Planning Committee Vermont System Planning

Elliptic measure and rectifiability Tatiana Toro University of Washington Workshop on Real

Symbolic Unfoldings for Networks of Timed Automata Franck Cassez 1 Thomas Chatain 2 Claude Jard 2

Software developers are human, too! Bogdan Vasilescu TU Eindhoven, NL CSCW 2014 Doctoral

The Embedded Learning Library The Embedded Learning Library (ELL) - PDF document

The Embedded Learning Library The Embedded Learning Library (ELL) Cross-compiler for AI pipelines, specialized for resource constrained target platforms T arget ELL AI Machine Pipeline Code https://github.com/Microsoft/ELL The Embedded

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum &amp; Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

The Chester Community Library The Chester Community Library The Chester Community Library The

Wolfner Talking Book and Braille Library That All May Read Wolfner Library About Wolfner

Library Services in Guatemala and Utah: Applying lessons learned abroad in library outreach and

Standard Cell Library/Library Exchange Format (LEF) Advanced VLSI Design CMPE 641 Library

Innovation with Microsoft PPM Summit Boston October 24 th , 2019 365 Why Project Online and

Tree Automata Geetam Chawla Stanly Samuel ATC Seminar 2018 Intro DTA Pumping Lemma

Compiler Construction Lecture 6: Syntax Analysis II ( LL ( k ) Grammars) Thomas Noll Lehrstuhl

NIRS Annual Retirement Policy Conference Presentation: Commission on Retirement Security and

Maine NTA Processes and Policy Vermont System Planning Committee Vermont System Planning

Elliptic measure and rectifiability Tatiana Toro University of Washington Workshop on Real

Symbolic Unfoldings for Networks of Timed Automata Franck Cassez 1 Thomas Chatain 2 Claude Jard 2

Software developers are human, too! Bogdan Vasilescu TU Eindhoven, NL CSCW 2014 Doctoral

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library