IMAGE CLASSIFICATION WITH NVIDIA DIGITS Pedro Mario Cruz e Silva (pcruzesilva@nvidia.com) Solution Architect Manager Enterprise Latin America Global Oil & Gas Team
DEEP LEARNING WITH DIGITS Hands-On Lab “NVIDIA QwikLabs ” https://nvlabs.qwiklab.com “ Image Classification with DIGITS” 2
3
4
INTRODUCTION 5
LEARNING FROM DATA AND SOME BUZZ WORDS ARTIFICAL INTELLIGENCE MACHINE Knowledge & Reason LEARNING DEEP Learning LEARNING Learning from data Learning from data Planning Expert systems Neural networks Communicating Handcrafted features Computer learned Perceiving features 6
TRADITIONAL COMPUTING MODEL “Label” Input Output Algorithm 7
A NEW COMPUTING MODEL TRAINING Training Data Trained Neural Network Input “Label” Output INFERENCE Trained Neural Network “Label” Output Input 8
A NEW COMPUTING MODEL Outperform experts, facts, rules with software that writes software ImageNet 100% 90% 80% 70% 60% 50% 40% 30% Traditional CV 20% Deep Learning 10% 0% 2009 2010 2011 2012 2013 2014 2015 2016 Traditional Computer Vision Deep Learning Object Detection Deep Learning Achieves Experts + Time DNN + Data + GPU “Superhuman” Results 9
MNIST (MODIFIED NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY) 10
11
NVIDIA DIGITS 12
POWERING THE DEEP LEARNING ECOSYSTEM NVIDIA SDK accelerates every major framework COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING OBJECT DETECTION IMAGE CLASSIFICATION VOICE RECOGNITION LANGUAGE TRANSLATION RECOMMENDATION ENGINES SENTIMENT ANALYSIS DEEP LEARNING FRAMEWORKS Mocha.jl NVIDIA DEEP LEARNING SDK 13 developer.nvidia.com/deep-learning-software
NVIDIA DIGITS Interactive Deep Learning GPU Training System Interactive deep neural network development environment for image classification and object detection Schedule, monitor, and manage neural network training jobs Analyze accuracy and loss in real time Track datasets, results, and trained neural networks Scale training jobs across multiple GPUs automatically developer.nvidia.com/digits 14
DEEP LEARNING WORKFLOWS New in DIGITS 5 IMAGE OBJECT IMAGE CLASSIFICATION DETECTION SEGMENTATION 98% Dog 2% Cat Classify images into Find instances of objects Partition image into classes or categories in an image multiple regions Object of interest could Objects are identified Regions are classified at be anywhere in the image with bounding boxes the pixel level 15
STEP 0 – RUN DIGITS 16
17
18
STEP 1 – CREATE DATASET 19
LOAD AND ORGANIZE DATA 20
LOAD AND ORGANIZE DATA 21
EXPLORE DB 22
23
24
25
26
STEP 2 – TRAINING A MODEL 27
TRAINING A MODEL 28
TRAINING A MODEL 29
TRAINING A MODEL 30
TRAINING A MODEL 31
32
STEP 3 – INFERENCE 33
INFERENCE 34
INFERENCE (TEST) 35
NVIDIA AI PLATFORM 36
TESLA P100 THE MOST ADVANCED HYPERSCALE DATACENTER GPU EVER BUILT 150B XTORS | 5.3TF FP64 | 10.6TF FP32 | 21.2TF FP16 | 14MB SM RF | 4MB L2 Cache 37
INTRODUCING TESLA P100 New GPU Architecture to Enable the World’s Fastest Compute Node Pascal Architecture NVLink CoWoS HBM2 Page Migration Engine T esla P100 CPU Unified Memory Highest Compute Performance GPU Interconnect for Unifying Compute & Memory in Simple Parallel Programming with Maximum Scalability Single Package Virtually Unlimited Memory Space NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 38
ANNOUNCING TESLA V100 GIANT LEAP FOR AI & HPC VOLTA WITH NEW TENSOR CORE 21B xtors | TSMC 12nm FFN | 815mm 2 5,120 CUDA cores 7.5 FP64 TFLOPS | 15 FP32 TFLOPS NEW 120 Tensor TFLOPS 20MB SM RF | 16MB Cache 16GB HBM2 @ 900 GB/s 300 GB/s NVLink 39
NEW TENSOR CORE New CUDA TensorOp instructions & data formats 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized for deep learning Activation Inputs Weights Inputs Output Results 40
TENSOR CORE 4x4x4 matrix multiply and accumulate 41
Tesla P100 vs Tesla V100 Tesla P100 (Pascal) Tesla V100 (Volta) Memory 16 GB (HBM2) 16 GB (HMB2) Memory Bandwidth 720 GB/s 900 GB/s 3x NVLINK 160 GB/s 300 GB/s CUDA Cores (FP32) 3584 5120 CUDA Cores (FP64) 1792 2560 Tensor Cores (TC) NA 640 Peak TFLOPS/s (FP32) 10.6 15 Peak TFLOPS/s (FP64) 5.3 7.5 Peak TFLOPS/s (TC) NA 120 Power 300 W 300 W 42
Tesla P100 vs Tesla V100 Tesla P100 (Pascal) Tesla V100 (Volta) Memory 16 GB (HBM2) 16 GB (HMB2) Memory Bandwidth 720 GB/s 900 GB/s NVLINK 160 GB/s 300 GB/s CUDA Cores (FP32) 3584 5120 CUDA Cores (FP64) 1792 2560 Tensor Cores (TC) NA 640 50% Peak TFLOPS/s (FP32) 10.6 15 Peak TFLOPS/s (FP64) 5.3 7.5 Peak TFLOPS/s (TC) NA 120 Power 300 W 300 W 43
NVIDIA GPU CLOUD SIMPLIFYING AI & HPC DEEP LEARNING HPC APPS HPC VIZ 44
45
DRAMATICALLY MORE FOR YOUR MONEY 5X Better HPC TCO for Same Throughput SAME THROUGHPUT MIXED HPC MIXED HPC WORKLOAD: WORKLOAD: 1/5 Amber, CHROMA, Amber, CHROMA, THE COST GTC, LAMMPS, GTC, LAMMPS, MILC, NAMD, MILC, NAMD, 1/7 Quantum Expresso, Quantum Espresso, SPECFEM3D SPECFEM3D THE SPACE 1/7 THE POWER 8 Accelerated Servers w/4 V100 GPUs 160 Self-hosted Skylake CPU Servers 13 KWatts 96 KWatts 46
NVIDIA SUPPORT PROGRAMS 47
developer.nvidia.com 48
NVIDIA DEEP LEARNING INSTITUTE Hands-on self-paced and instructor-led training in deep learning and accelerated computing for developers Autonomous Deep Learning Medical Image Vehicles Fundamentals Analysis Request onsite instructor-led workshops at your organization: www.nvidia.com/requestdli Take self-paced labs online: www.nvidia.com/dlilabs Download the course catalog, view upcoming Genomics Finance Intelligent Video workshops, and learn about the University Analytics Ambassador Program: www.nvidia.com/dli More industry- specific training coming soon… Game Development Accelerated Computing & Digital Content Fundamentals 49
NVIDIA HW GRANT PROGRAM Jetson TX2 Titan X Pascal Quadro P6000 (Dev Kit) Scientific Computing • • Scientific Visualization Robotics • HPC • • Virtual Reality Autonomous Machines • • Deep Learning 50
INCEPTION PROGRAM http://www.nvidia.com/object/inception-program.html 51
Pedro Mario Cruz e Silva (pcruzesilva@nvidia.com) Solution Architect Manager Enterprise Latin America Global Oil & Gas Team LinkedIn
Recommend
More recommend