Machine learning on mobile and edge devices with TensorFlow Lite - PowerPoint PPT Presentation

/** With TensorFlow Lite Support Library */ // 1. Load your model. MyImageClassifier classifier = new MyImageClassifier(activity); MyImageClassifier.Inputs inputs = classifier.createInputs(); // 2. Transform your data. inputs.loadImage(rgbFrameBitmap); // 3. Run inference. MyImageClassifier.Outputs outputs = classifier.run(inputs); // 4. Use the resulting output. Map<String, float> labeledProbabilities = outputs.getOutput(): 64

Running Your Model Op Kernels Converuer Interpreter Delegates

Language Bindings Swift Obj-C C & C# ● New language bindings (Swift, Obj-C, C# and C) for iOS, Android and Unity ● Community language bindings (Rust, Go, Flutter/Dart) Rust Go Flutter/Dart

Running TensorFlow Lite on Microcontrollers

What are they? Small computer on IO a single circuit MCU No operating system ● Tens of KB of RAM & Flash ● Only CPU, memory & I/O peripherals ● RAM CPU ROM Exist all around us ●

Class 1 Class 1 Deeper Input Output Input Output Network Class 2 Class 2 MCU MCU Application Processor Is there any sound? Is that human speech?

TensorFlow Lite for TensorFlow Saved Model microcontrollers TensorFlow provides you with a TensorFlow Lite Flat Buffer Format single framework to deploy on Microcontrollers as well as phones TensorFlow Lite Interpreter TensorFlow Lite Micro Interpreter

Example

What can you do on an MCU? Simple speech recognition ● Person detection using a camera ● Gesture recognition using an accelerometer ● Predictive maintenance ●

Speech Detection on an MCU Recognizes “Yes” and “No” ● Retrainable for other words ● 20KB model ● 7 million ops per second ●

Person Detection on an MCU Recognizes if a person is visible in camera feed ● Retrainable for other objects ● 250KB MobileNet model ● 60 million ops per inference ●

Gesture Detection on an MCU Spots wand gestures ● Retrainable for other gestures ● 20KB model ●

Improving your model pergormance

Incredible Pergormance CPU GPU Enable your models to DSP NPU run as fast as possible on all hardware

Incredible Pergormance CPU CPU 2.8x GPU 6.2x EdgeTPU 18.5x 37 ms 13 ms 6 ms 2 ms Floating Quantized OpenCL Quantized point Fixed-point Float16 Fixed-point Mobilenet V1 Pixel 4 - Single Threaded CPU, October 2019

Common techniques to improve model pergormance ● Use quantization ● Use pruning ● Leverage hardware accelerator ● Use mobile optimized model architecture ● Per-op profjling

Utilizing quantization for CPU, DSP & NPU optimizations Reduce precision of static parameters (e.g. weights) and dynamic values (e.g. activations)

Pruning Remove connections during training in order to increase sparsity.

Running Your Model Highly optimized for ARM Op Kernels Neon instruction set Converuer Interpreter Accelerators like GPU, DSP and Edge TPU Delegates Integrate with Android Neural Network API

Utilizing Accelerators via Delegates Op Op Input CPU Operation Kernels Op Interpreter Core Op Op Activation Accelerator Delegate Op

GPU Delegation enables faster fmoat execution 2–7x faster than the fmoating point CPU implementation ● ● Uses OpenGL & OpenCL on Android and Metal on iOS Accepts fmoat models (fmoat16 or fmoat32) ●

DSP Delegation through Qualcomm Hexagon DSP Use Hexagon delegate on Android O & below ● ● Use NN API on Android P & beyond Accepts integer models (uint8) ● Launching soon! ●

Delegation through Android Neural Networks API Enables graph acceleration on DSP, GPU and NPU ● ● Supporus 30+ ops in Android P, 90+ ops in Android Q Accepts fmoat (fmoat16, fmoat32) and integer models (uint8) ●

/** Initializes an {@code ImageClassifier}. */ ImageClassifier(Activity activity) throws IOException { tfliteModel = loadModelFile(activity); delegate = new GpuDelegate(); tfliteOptions.addDelegate(delegate); tflite = new Interpreter(tfliteModel, tfliteOptions); ... } 88

/** Initializes an {@code ImageClassifier}. */ ImageClassifier(Activity activity) throws IOException { tfliteModel = loadModelFile(activity); delegate = new NnApiDelegate(); tfliteOptions.addDelegate(delegate); tflite = new Interpreter(tfliteModel, tfliteOptions); ... } 89

Model Comparison Inception v3 Mobilenet v1 Top-1 accuracy 77.9% 68.3% -11% Top-5 accuracy 93.8% 88.1% -6% Inference latency 1433 ms 95.7 ms 15x faster Model size 95.3 MB 10.3 MB 9.3x smaller 90

Per-op Profjling bazel build -c opt \ --config=android_arm64 --cxxopt='--std=c++11' \ --copt=-DTFLITE_PROFILING_ENABLED \ //tensorflow/lite/tools/benchmark:benchmark_model adb push .../benchmark_model /data/local/tmp adb shell taskset f0 /data/local/tmp/benchmark_model 91

Per-op Profjling Number of nodes executed: 31 ============================== Summary by node type ============================== [node type] [count] [avg ms] [avg %] [cdf %] CONV_2D 15 1.406 89.270% 89.270% DEPTHWISE_CONV_2D 13 0.169 10.730% 100.000% SOFTMAX 1 0.000 0.000% 100.000% RESHAPE 1 0.000 0.000% 100.000% AVERAGE_POOL_2D 1 0.000 0.000% 100.000% 92

Improving your operator coverage

Expand operators, reduce size Utilize TensorFlow ops if op is not natively supporued ● Only include required ops to reduce the runtime’s size ●

Using TensorFlow operators Enables hundreds more ops from TensorFlow on CPU ● Caveat: Binary size increase (~6MB compressed) ●

import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] tflite_model = converter.convert() open("converted_model.tflite", "wb").write(tflite_model) 96

Reduce overall runtime size Selectively include only the ops required by the model ● Pares down the size of the binary ●

/* my_inference.cc */ // Forward declaration for RegisterSelectedOps. void RegisterSelectedOps(::tflite::MutableOpResolver* resolver); … ::tflite::MutableOpResolver resolver; RegisterSelectedOps(&resolver); std::unique_ptr<::tflite::Interpreter> interpreter; ::tflite::InterpreterBuilder(*model, resolver)(&interpreter); … 98

gen_selected_ops( name = "my_op_resolver" model = ":my_tflite_model" ) cc_library( name = "my_inference", srcs = ["my_inference.cc", ":my_op_resolver"] ) 99

How to get starued

Machine learning on mobile and edge devices with TensorFlow Lite - PowerPoint PPT Presentation

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for TensorFlow Lite Co-wrote this book Daniel Situnayake @dansitu TensorFlow Lite is a production ready, cross-platgorm framework for deploying ML on

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Mobile Devices Mobile Devices NAGTRI Webinar Series NCJRL / NAAG Objectives Objectives

Mobile Edge Cloud Services in 5G Yanyong Zhang WINLAB, Rutgers University

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Deploying Machine Learning Models on The Edge Deploying Machine Learning Models on The Edge Yan

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Localization ocalization of mobile devices of mobile devices L Seminar: Mobile Computing IFW

Deployment Characteristics of "The Edge" in Mobile Edge Computing Meenakshi Syamkumar *

Hierarchy of Provider Edge Devices in Hierarchy of Provider Edge Devices in BGP/MPLS VPN

Mobile App Development Mara Gmez Software Engineering Course - Summer Semester 2017 Mobile

It's Polleverywhere Time! Introduction Mobile learning Mobile learning is the use of mobile

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

Building Beautiful Apps Ahmed Abu Eldahab GDE Flutuer & Daru @dahabdev Tell me about you?

IDEEMATEC THE NEW DEFINITION OF UNLINKED TRACKING IDEEMATEC is a leading provider of solar

Turbomachinery Applications w ith STAR-CCM+ Fred Mendona Fred Mendona Turbomachinery Sector

Hi High Per gh Performanc formance e Li Liqu quid id Chr Chroma omatogra tography phy

Prediction of flutter instability in turbulent flow based on Linear Stability Analysis J.Moulin,

d + d T T u F dV u F dS b s V S (1) ( ) 2 Formulations = d T r + d

Stall, Flutter and Thrust Generation of an Oscillating Airfoil VAITLA LAXMAN

DESIGN CODES AND DUE DILIGENCE FOR WIND RESILIENT PV TRACKERS LVARO CASADO AMEA Manager

Machine learning on mobile and edge devices with TensorFlow Lite - PowerPoint PPT Presentation

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for TensorFlow Lite Co-wrote this book Daniel Situnayake @dansitu TensorFlow Lite is a production ready, cross-platgorm framework for deploying ML on

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Mobile Devices Mobile Devices NAGTRI Webinar Series NCJRL / NAAG Objectives Objectives

Mobile Edge Cloud Services in 5G Yanyong Zhang WINLAB, Rutgers University

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Deploying Machine Learning Models on The Edge Deploying Machine Learning Models on The Edge Yan

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Localization ocalization of mobile devices of mobile devices L Seminar: Mobile Computing IFW

Deployment Characteristics of &quot;The Edge&quot; in Mobile Edge Computing Meenakshi Syamkumar *

Hierarchy of Provider Edge Devices in Hierarchy of Provider Edge Devices in BGP/MPLS VPN

Mobile App Development Mara Gmez Software Engineering Course - Summer Semester 2017 Mobile

It's Polleverywhere Time! Introduction Mobile learning Mobile learning is the use of mobile

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

Building Beautiful Apps Ahmed Abu Eldahab GDE Flutuer &amp; Daru @dahabdev Tell me about you?

IDEEMATEC THE NEW DEFINITION OF UNLINKED TRACKING IDEEMATEC is a leading provider of solar

Turbomachinery Applications w ith STAR-CCM+ Fred Mendona Fred Mendona Turbomachinery Sector

Hi High Per gh Performanc formance e Li Liqu quid id Chr Chroma omatogra tography phy

Prediction of flutter instability in turbulent flow based on Linear Stability Analysis J.Moulin,

d + d T T u F dV u F dS b s V S (1) ( ) 2 Formulations = d T r + d

Stall, Flutter and Thrust Generation of an Oscillating Airfoil VAITLA LAXMAN

DESIGN CODES AND DUE DILIGENCE FOR WIND RESILIENT PV TRACKERS LVARO CASADO AMEA Manager

Deployment Characteristics of "The Edge" in Mobile Edge Computing Meenakshi Syamkumar *

Building Beautiful Apps Ahmed Abu Eldahab GDE Flutuer & Daru @dahabdev Tell me about you?