DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta | Solutions Architecture, Autonomous Driving Poonam Chitale | AI Infra Product Manager 1
Deep Learning has changed the way we think about developing software 2
NVIDIA DRIVE END-TO-END PLATFORM COLLECT DATA DRIVE TRAIN MODELS SIMULATE Cars Pedestrians Cars Pedestrians Lanes Lanes Path Path Signs Lights Signs Lights
INDUSTRY GRADE DEEP LEARNING What does it take to get DNNs into Production ? DA T A Inference Data Compute Inference How to Build Compute, Storage Data Scale and Management DL Deployment Infrastructure and other Infra to enable Training Infrastructure Progression for Path to Production 4
GENERIC DEEP LEARNING WORKFLOW FOR AUTONOMOUS VEHICLES 5
DL FOR AUTONOMOUS VEHICLES PBs of data, large-scale POST /datasets/{id} labeling, large-scale training, etc. Inference optimized DNN (TensorRT) Datasets Manually selected data Labels Train/test Deep Learning data Labeling Metrics Simulation, verification results 6
POST /datasets/{id} DL FOR AUTONOMOUS VEHICLES Active learning strategies to meet business needs Trained Models Inference optimized DNN Mine highly confused / most (TensorRT) informative data Datasets Intelligently selected data Labels Train/test Deep Learning data Labeling 7
“ Autonomous vehicles need to be driven more than 11 billion miles to be 20% better than humans. With a fleet of 100 vehicles, 24 hours a day, 365 days a year, at 25 miles per hour, this would take 518 years. Rand Corporation, Driving to Safety 8
DL FOR AUTONOMOUS VEHICLES Assumptions regarding scale Data Collection fleet == 100 cars 2000h of data collected per car, per year Assuming 5 2MP cameras per car, radar data, etc. => 1 TB / h / car Grand total of 200 PB collected per year! Only 1/1000 likely to be used for training (curated, labeled data) 12.1 years training a ResNet50-like network on Pascal, 1.5 years on DGX1 w/ Volta Today, with 8 DGX1s, and 1/10th of that training data, can train in 1 week 10
Challenges for building DL workflows for Autonomous Vehicles Scaling Tracking Best Managing Workflows Experiments Practices Datasets Tracking large, continuously Optimal scheduling and Reproducible Research Collaborating on datasets, workflows and experiments evolving datasets automation of AI workflows Performance tracking 11
OVERALL WORKFLOW Data Platform Application Platform Continuous Optimization Ingest petabytes of Build training Inspect recorded recorded data workflows workflows Discover best model Transcode and Validate with re- Generate index raw data simulation metrics Label data and Deploy to TensorRT export for training and run with NVIDIA Guide selection DRIVE of data 12
DATA PLATFORM 13
DL DATA PLATFORM Continuously Validate Repeat Process Label Analyze Collect Curate Export Dashboard Process Data Annotate Metrics Curate Ingestion Export Storage Cluster Data Management and Services 14
DATA – COLLECTION AND INGESTION Collecting data and processing ➢ Continuously ingest data, at roughly 1TB/hour/car ➢ Data Ingestion linearly increases with number of cars. ➢ Diverse data-sets get better DNN ➢ Dedicated systems for Ingestion ➢ Transcoding of raw data to consumable formats ➢ Data compression and caching 15
DATA COMPRESSION A discussion Couple of factors ➢ Data compression - Car and/or Cloud ➢ Data environment – Day/Night, Urban vs Highway ➢ Lossless vs Lossy compression ➢ NVIDIA’s Experience ➢ DW exposes lossless compression today LRAW, ~2x compression. ➢ Lossy compression – Active area of R&D, How does AI work on compressed data? Good area of R&D NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 16
DATA – COLLECTION AND INGESTION Useful data for AI & Applying DNNs 10s of PBs Data 100s of PBs of 20% to 50% of data Labelling Data may not be useful throughput DNNs Raw Data Compressed Data Useful Data Labeled Data Data from test fleets of 10, 30, 50 and 100 cars 17
DATA – CURATION AND INDEXING Selecting the most interesting data for labeling Search from recorded sessions Frame selection 18
DATA – LABELING & EXPORT Ensuring quality of labels Unlabeled frame Dataset Export Labeled frame ➢ Standard guidelines and processes are required to correctly annotate frames ➢ Producing high quality labeled data exported for model training purpose ➢ QA and double labeling is important 19
APPLICATION/COMPUTE PLATFORM 20
DL BUILDING AUTONOMOUS VEHICLES Steps Train the model on Continuous Optimize and real data Build the Model Integration Validate the Model Deploy the Model (hyperparameter tuning) Build a Make sure that the Make the model work Prepare the model for Provide functionality Goal promising model code base remains with real data and serving and validate it using the model bug free optimize it Iteration Hours Hours Days - Weeks Hours - Weeks Milliseconds Time # of 1 10s 10s – 100s 10s Hundreds (test fleet) Machines Millions (live fleet) 2-4 TitanX / 4-8 Tesla P/V100 4-8 Tesla P/V100 4-8 Tesla P/V100 Xavier GPU Tesla P/V 100 21
DL Application Platform Test Validate Repeat Use Run Analyze Build Datasets Training Results Experiments Workflow Model Store Dataset Manager Experiment Service Service Training Cluster (10’s of thousands of GPUs) 22
AV CLUSTER On Premises Infrastructure ➢ Cluster using NVIDIA DGX-1 with Volta Every DGX1 connected ➢ via Infiniband for multi- node training Level1 Hierarchical Storage – ➢ Storage Level0 Storage Local SSD in DGX-1 and Hundreds high bandwidth Storage of TBs 7TB for training data cache High- SSD bandwidth In DGX-1 Storage Multiple Level of Storage ➢ Hierarchies Dedicated connection ➢ between on premises and cloud Infra for dedicated bandwidth.
STORAGE REQUIREMENTS Tiered Planning for Storage On Premises Infrastructure CLOUD Storage Architecture should be • of multiple tiers. • On Premises Level0 Storage : 7TB SSD • per DGX1 Level1 Storage – Hundreds • of high bandwidth TBs Level2 storage, Level3 Level1 Dedicated Storage Level0 Storage Storage connection Storage Private/Public Cloud • Highly Cold Level2 Storage – Highly available • 7TB storage High- replicated SSD for bandwidth Available Storage, 10s of storage In DGX-1 archival Storage PBs • Level 3 Storage – Cold storage for Archival, may be 50’s PBs NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 24
DL FOR AUTONOMOUS VEHICLES Infrastructure 960 TFLOPs per DGX1 (FP16) 7TB SSD per DGX1 High-speed external storage (multi-PB) Infiniband as interconnect NCCL 2.0 Data+model management
CONTINUOUS OPTIMIZATION 26
WORKFLOW AUTOMATION & OPTIMIZATION Workflow Automation Continuous Optimization • Self documenting Workflows • Ease of training models with ▪ Traceability of data new data ▪ Models ▪ Integration with Data Platform ▪ Experiment sets • Testing and validating ▪ Datasets ▪ Rigorous Testing ▪ Versioning ▪ Simulation • Compute • Metrics calculation ▪ Automated Scheduling ▪ Data diversity ▪ Optimal GPU selection ▪ KPIs tracking • Collaboration ▪ Accuracy ▪ Best practices ▪ Performance ▪ Modular flexible extensible APIs 27
BEST MODEL DISCOVERY Hyper Parameters • Training parameters ▪ Learning rate, batch size, optimizer, weight decay, regularization strength • Model architecture ▪ Batch-norm, activation functions, convolution stride, filter size • Data augmentation ▪ Max translation, color augmentations, potentially shearing, flips, crops • Post-processing ▪ Clustering 28
EXAMPLE WORKFLOW From Data to Training to Deployment Get Data Train & Test Adjust Export Test & Validate and Repeat Dataset Exported Fine Tuned exported Trained Model from Model Model At the Edge Labeling Software Continuously Optimize Fine-tune 29
DEPLOYMENT - INFERENCE 30
TENSORRT DEPLOYMENT WORKFLOW Step 1 : Optimize trained model Plan 1 Plan 2 Plan 3 TensorRT Optimizer Trained Neural (platform, batch size, Serialize to disk Optimized Plans Network precision) Step 2 : Deploy optimized plans with runtime Plan 1 Plan 2 Embedded Automotive Data center Plan 3 TensorRT Runtime Engine 31
NVIDIA’S END -TO-END PRODUCT FAMILY TRAINING INFERENCE Fully Integrated DL Supercomputer Data Center Automotive Embedded DGX-1 Desk Side Data Center Tesla P4 Tesla V100 Drive PX2 Jetson TX1 DGX Station Tesla V100/P40 Tesla P100 33
HOW GPU BASED INFRA IS HELPING 34
AI IS YOUR COMPETITIVE ADVANTAGE Significant Return on Investment REDUCED TTM COMPETITIVE REDUCE TIME TO MARKET ADVANTAGE REVENUES (TTM) OVERALL LOWER AVOID FINES AND DATACENTER TCO SETTLEMENTS 35
NEXT STEPS Identify and enable the right scale and capabilities Deep dive on your current and future state use of AI for Self-Driving Understand and discuss your goals + objectives, frame approach and size scale Develop phased roadmap for AI computational scale Leverage NVIDIA Deep Learning Institute to train and develop your team 36
THANK YOU 37
Recommend
More recommend