S8286 : QUICK AND EASY DL WORKFLOW PROOF OF CONCEPT Alec Gunny Ken - PowerPoint PPT Presentation

S8286 : QUICK AND EASY DL WORKFLOW PROOF OF CONCEPT Alec Gunny Ken Hester

Deep Learning in Production - Current Approaches - Deployment Challenges NVIDIA TensorRT - Programmable Inference Accelerator - Performance, Optimizations and Features AGENDA Example - Import, Optimize and Deploy TensorFlow Models with TensorRT Additional Resources Q&A 2

SINGLE GPU PLATFORM FOR ALL ACCELERATED WORKLOADS BOOSTS ALL ACCELERATED WORKLOADS 10M Users 40 years of video/day HPC AI Training AI Inference Data Analytic +450 Applications cuBLAS NCCL NVIDIA DEEP LEARNING DeepStream SDK SDK and CUDA libraries cuDNN DGX TESLA V100 - UNIVERSAL GPU 3

WHERE TO TRAIN At Your Desk In-the-Cloud On-Prem 4

CURRENT DEPLOYMENT WORKFLOW TRAINING UNOPTIMIZED DEPLOYMENT 1 Deploy training Data Management framework 2 Training Training Trained Neural Deploy custom Data Network application using NVIDIA DL SDK Model Assessment 3 Framework or custom CPU-Only application CUDA, NVIDIA Deep Learning SDK (cuDNN, cuBLAS, NCCL) 5

CHALLENGES WITH CURRENT APPROACHES Requirement Challenges Unable to processing high-volume, high-velocity data High Throughput ➢ Impact: Increased cost ($, time) per inference Applications don’t deliver real -time results ➢ Impact: Negatively affects user experience (voice recognition, Low Response Time personalized recommendations, real-time object detection) Inefficient applications Power and Memory ➢ Impact: Increased cost (running and cooling), makes deployment Efficiency infeasible Research frameworks not designed for production Deployment-Grade ➢ Impact: Framework overhead and dependencies increases time Solution to solution and affects productivity 6

NVIDIA DEEP LEARNING SOFTWARE PLATFORM TRAINING INFERENCE Data GRE + T ensorRT Data center Management Training Training Trained Neural Data Network Embedded JETPACK SDK Model Assessment Automotive DriveWorks SDK NVIDIA DEEP LEARNING SDK and CUDA 7 developer .nvidia.com/deep-learning-software

NVIDIA TENSORRT Programmable Inference Accelerator FRAMEWORKS GPU PLATFORMS TESLA P4 TensorRT JETSON TX2 Optimizer Runtime DRIVE PX 2 NVIDIA DLA TESLA V100 8 developer.nvidia.com/tensorrt

NVIDIA TENSORRT PROGRAMMABLE NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM INFERENCING PLATFORM UFF TESLA P4 TensorRT JETSON TX2 Optimizer Runtime DRIVE PX 2 TRT Network NVIDIA DLA API TESLA V100 9

TENSORRT PERFORMANCE 40x Faster CNNs on V100 vs. CPU-Only 140x Faster Language Translation RNNs on Under 7ms Latency (ResNet50) V100 vs. CPU-Only Inference (OpenNMT) 600 500 40 550 6,000 5700 450 35 500 400 5,000 30 Latency (ms) 350 Images/sec 400 Latency (ms) Images/sec 4,000 25 300 280 ms 300 250 20 3,000 200 14 ms 15 200 2,000 153 ms 150 117 ms 10 100 6.83 ms 6.67 ms 100 1,000 5 50 25 305 140 4 0 0 0 0 CPU-Only + Torch V100 + Torch V100 + TensorRT CPU-Only V100 + TensorFlow V100 + TensorRT Inference throughput (images/sec) on ResNet50. V100 + TensorRT : NVIDIA TensorRT (FP16), batch size 39, Tesla V100-SXM2- Inference throughput (sentences/sec) on OpenNMT 692M. V100 + TensorRT : NVIDIA TensorRT (FP32), batch size 64, Tesla V100- 16GB, E5-2690 v4@2.60GHz 3.5GHz Turbo (Broadwell) HT On V100 + TensorFlow : Preview of volta optimized TensorFlow (FP16), PCIE-16GB, E5-2690 v4@2.60GHz 3.5GHz Turbo (Broadwell) HT On. V100 + Torch : Torch (FP32), batch size 4, Tesla V100-PCIE- batch size 2, Tesla V100-PCIE-16GB, E5-2690 v4@2.60GHz 3.5GHz Turbo (Broadwell) HT On. CPU-Only: Intel Xeon-D 1587 16GB, E5-2690 v4@2.60GHz 3.5GHz Turbo (Broadwell) HT On. CPU-Only: Torch (FP32), batch size 1, Intel E5-2690 v4@2.60GHz Broadwell-E CPU and Intel DL SDK. Score doubled to comprehend Intel's stated claim of 2x performance improvement on Skylake 3.5GHz Turbo (Broadwell) HT On with AVX512. 10 developer.nvidia.com/tensorrt

TENSORRT DEPLOYMENT WORKFLOW Step 1 : Optimize trained model Plan 1 Import Serialize Model Engine Plan 2 Plan 3 Trained Neural TensorRT Optimizer Optimized Plans Network Step 2 : Deploy optimized plans with runtime De-serialize Deploy Plan 1 Engine Runtime Plan 2 Data center Plan 3 TensorRT Runtime Engine Optimized Plans Automotive Embedded 11

MODEL IMPORTING AI Researchers ➢ ➢ Data Scientists Example: Importing a T ensorFlow model Other Frameworks Python/C++ API Python/C++ API Network Model Importer Definition API Runtime inference C++ or Python API 13 developer.nvidia.com/tensorrt

TENSORRT OPTIMIZATIONS Layer & Tensor Fusion ➢ Optimizations are completely automatic ➢ Performed with a single function call Weights & Activation Precision Calibration Kernel Auto-Tuning Dynamic Tensor Memory 14

NVIDIA TENSORRT 3 NOW AVAILABLE Volta TensorCore  TensorFlow Importer  Python API Volta TensorCore Import TensorFlow Python API Support Models Data Scientists Compiled & Optimized Model 3.7x faster inference on Tesla Optimize and deploy TensorFlow Improved productivity with easy V100 vs. Tesla P100 under 7ms models up to 18x faster vs. to use Python API for data real-time latency TensorFlow framework science workflows Free download to members of NVIDIA Developer Program developer.nvidia.com/tensorrt 16

NVIDIA JETPACK 3.2 SDK for embedded AI computing Deep Learning Computer Vision GPU Compute Multimedia TensorRT ISP Support CUDA VisionWorks cuDNN Camera Imaging CUDA Libs OpenCV DIGITS Workflow Video CODEC Also includes ROS compatibility, OpenGL, advanced developer tools, and much more 17

DEMO Jetson TX2 AI Computer on a Module Advanced tech for intelligent machines Unmatched performance under 10W Smaller than a credit card 18

LEARN MORE Jetson: developer.nvidia.com/embedded-computing Success Stories: developer.nvidia.com/embedded/learn/success-stories Partners and Ecosystem: developer.nvidia.com/embedded/community Deep Learning Institute: www.nvidia.com/object/deep-learning-institute.html Two Days To A Demo: developer.nvidia.com/embedded/twodaystoademo Inception Program: www.nvidia.com/inception 19

S8286 : QUICK AND EASY DL WORKFLOW PROOF OF CONCEPT Alec Gunny Ken - PowerPoint PPT Presentation

S8286 : QUICK AND EASY DL WORKFLOW PROOF OF CONCEPT Alec Gunny Ken Hester Deep Learning in Production - Current Approaches - Deployment Challenges NVIDIA TensorRT - Programmable Inference Accelerator - Performance, Optimizations and

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

3 Sep 2019 to 24 Feb 2020 Colin Strutt Dave Piscitello ECAINA Proof of Concept Feasibility

The Quick and Easy Guide to Presentation Planning Many people take far too long to plan a

World 2012 1 Extending Lecture Recording Systems A simple proof of concept Adam Reed Division

CS70: Lecture 2. Outline. Quick Background and Notation. Direct Proof. Theorem: For any a , b , c

Enterprise Vocabulary Development in Protege/OWL: Workflow and Concept History Requirements

W O R K I N G R E M O T E L Y Quick and Easy virtual sessison to support your teams

Proof-of-Concept Working Groups August 5, 2016; Facilitators John Lumpkin & Andy Wiesenthal

CS70: Lecture 2. Outline. Quick Background and Notation. Direct Proof (Forward Reasoning).

workflow: workflow: QSPR = Quantitative Structure Property

TOWARDS SOLVING ESSENCE WITH LOCAL SEARCH: A PROOF OF CONCEPT USING SETS AND MULTISETS Saad

5G CBRS Proof- of-Concept for Scientific Applications Introduction IOT Sensors CBRS Spectrum

Recovery Credit System : Proof of Concept for GCWA Brian

Uzi Vishkin 1. (Lack of) ease-of-programming failed all parallel computers to date 2. Vendors are

Exowrap, the one-step insulation system, is the quick and easy way to insulate high

hacking in physically addressable memory a proof of concept David Rasmus Piegdon Supervisor:

Don't Be Silly, The Short Rod Is Quick, Easy and Effective in Treating Hip Intertroch Fractures

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

COVID-19 Temporary Quarantine and Isolation Center: A Proof of Concept for Behavioral Health

Proof-of-Concept Phase-2a Clinical Trial of ANB020 (Anti-IL-33 Antibody) in the Treatment of

ABX464 is safe and efficacious in a proof of concept study in Ulcerative Colitis patients S.

Can openEHR archetypes be used in a national context? The Danish archetype Proof-of-Concept

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Proof of Concept for Tipifarnib in Relapsed or Refractory Angioimmunoblastic T- Cell Lymphoma

S8286 : QUICK AND EASY DL WORKFLOW PROOF OF CONCEPT Alec Gunny Ken - PowerPoint PPT Presentation

S8286 : QUICK AND EASY DL WORKFLOW PROOF OF CONCEPT Alec Gunny Ken Hester Deep Learning in Production - Current Approaches - Deployment Challenges NVIDIA TensorRT - Programmable Inference Accelerator - Performance, Optimizations and

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

3 Sep 2019 to 24 Feb 2020 Colin Strutt Dave Piscitello ECAINA Proof of Concept Feasibility

The Quick and Easy Guide to Presentation Planning Many people take far too long to plan a

World 2012 1 Extending Lecture Recording Systems A simple proof of concept Adam Reed Division

CS70: Lecture 2. Outline. Quick Background and Notation. Direct Proof. Theorem: For any a , b , c

Enterprise Vocabulary Development in Protege/OWL: Workflow and Concept History Requirements

W O R K I N G R E M O T E L Y Quick and Easy virtual sessison to support your teams

Proof-of-Concept Working Groups August 5, 2016; Facilitators John Lumpkin &amp; Andy Wiesenthal

CS70: Lecture 2. Outline. Quick Background and Notation. Direct Proof (Forward Reasoning).

workflow: workflow: QSPR = Quantitative Structure Property

TOWARDS SOLVING ESSENCE WITH LOCAL SEARCH: A PROOF OF CONCEPT USING SETS AND MULTISETS Saad

5G CBRS Proof- of-Concept for Scientific Applications Introduction IOT Sensors CBRS Spectrum

Recovery Credit System : Proof of Concept for GCWA Brian

Uzi Vishkin 1. (Lack of) ease-of-programming failed all parallel computers to date 2. Vendors are

Exowrap, the one-step insulation system, is the quick and easy way to insulate high

hacking in physically addressable memory a proof of concept David Rasmus Piegdon Supervisor:

Don't Be Silly, The Short Rod Is Quick, Easy and Effective in Treating Hip Intertroch Fractures

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

COVID-19 Temporary Quarantine and Isolation Center: A Proof of Concept for Behavioral Health

Proof-of-Concept Phase-2a Clinical Trial of ANB020 (Anti-IL-33 Antibody) in the Treatment of

ABX464 is safe and efficacious in a proof of concept study in Ulcerative Colitis patients S.

Can openEHR archetypes be used in a national context? The Danish archetype Proof-of-Concept

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Proof of Concept for Tipifarnib in Relapsed or Refractory Angioimmunoblastic T- Cell Lymphoma

Proof-of-Concept Working Groups August 5, 2016; Facilitators John Lumpkin & Andy Wiesenthal