Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1 The University of North Carolina at Chapel Hill 2 General Motors Research
Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1 The University of North Carolina at Chapel Hill 2 General Motors Research
� 3 Ming Yang - RTAS 2019
� 3 Ming Yang - RTAS 2019
� 3 Ming Yang - RTAS 2019
� 3 Ming Yang - RTAS 2019
https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ � 3 Ming Yang - RTAS 2019
https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY
https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY
https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY
https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy 3. Throughput Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY
Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy 3. Throughput � 4 Ming Yang - RTAS 2019
Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Hardware 2. Accuracy resources are 3. Throughput constrained and expensive . � 4 Ming Yang - RTAS 2019
Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Hardware 2. Accuracy resources are 3. Throughput constrained and expensive . CNN software underutilizes the hardware. � 4 Ming Yang - RTAS 2019
Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019
Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019
Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019
Current CNN frameworks Gaps CPU GPU � 5 Ming Yang - RTAS 2019
Current CNN frameworks Gaps CPU GPU Cycles not utilized � 5 Ming Yang - RTAS 2019
Current CNN frameworks Gaps CPU GPU Cycles not utilized Single CNN underutilizes the hardware. � 5 Ming Yang - RTAS 2019
Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 � 6 Ming Yang - RTAS 2019
Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: � 6 Ming Yang - RTAS 2019
Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. � 6 Ming Yang - RTAS 2019
Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. � 6 Ming Yang - RTAS 2019
Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. 3. Fast synchronization between cameras becomes harder. � 6 Ming Yang - RTAS 2019
Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Parallelism through multi-process isn’t helping. Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. 3. Fast synchronization between cameras becomes harder. � 6 Ming Yang - RTAS 2019
Proposed Solutions Part I: Parallel Execution for CNN frameworks Multi-camera Composite Images to provide Part II: high throughput for multiple cameras. � 7 Ming Yang - RTAS 2019
Proposed Solutions Part I: Parallel Execution for CNN frameworks Multi-camera Composite Images to provide Part II: high throughput for multiple cameras. � 8 Ming Yang - RTAS 2019
Let’s re-think the design of CNN frameworks Layer 1 Layer n Part I: Parallel Execution … Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution … Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution • Processing of images can be … independent , e.g., object detection. Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution • Processing of images can be … independent , e.g., object detection. Part II: Multi- camera Composite Images We enable parallel execution for CNN frameworks and shared CNN for multiple cameras. � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Stage 0 Stage 1 Stage N-1 … • Generalize concept of 퓁 퓀 r + e y 퓁 layers into stages a r L e y a L … � 10 Ming Yang - RTAS 2019
Stage 0 Stage 1 Stage N-1 Queue 1 Queue N-1 Queue 0 1 3 2 … • Generalize concept of Bookkeeping Frames layers into stages data 0 0 • Communicate data 1 Data for 1 Frame 1 2 between stages using Data for 2 Frame 2 3 Data for PGM RT (a processing 3 Frame 3 graph management tool) � 11 Ming Yang - RTAS 2019
Shared CNN Cameras Detection box results … Stage 0 Stage 1 Stage N-1 0 Queue 1 Queue N-1 Queue 0 1 3 2 … … C-1 • Generalize concept of Bookkeeping Frames layers into stages data 0 0 • Communicate data 1 Data for 1 Frame 1 2 between stages using Data for 2 Frame 2 3 Data for PGM RT (a processing 3 Frame 3 graph management tool) • Share CNN among multiple cameras � 12 Ming Yang - RTAS 2019
Different Execution Methods Part I: Parallel Execution Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution P IPELINE shared CNN that has one thread per stage Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution P IPELINE shared CNN that has one thread per stage Part II: Multi- camera Composite P ARALLEL shared CNN that has multiple threads per stage Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019
Recommend
More recommend