re thinking cnn frameworks for time sensitive autonomous
play

Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving - PowerPoint PPT Presentation

Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1


  1. Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1 The University of North Carolina at Chapel Hill 2 General Motors Research

  2. Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1 The University of North Carolina at Chapel Hill 2 General Motors Research

  3. � 3 Ming Yang - RTAS 2019

  4. � 3 Ming Yang - RTAS 2019

  5. � 3 Ming Yang - RTAS 2019

  6. � 3 Ming Yang - RTAS 2019

  7. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ � 3 Ming Yang - RTAS 2019

  8. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  9. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  10. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  11. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy 3. Throughput Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  12. Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy 3. Throughput � 4 Ming Yang - RTAS 2019

  13. Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Hardware 2. Accuracy resources are 3. Throughput constrained and expensive . � 4 Ming Yang - RTAS 2019

  14. Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Hardware 2. Accuracy resources are 3. Throughput constrained and expensive . CNN software underutilizes the hardware. � 4 Ming Yang - RTAS 2019

  15. Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019

  16. Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019

  17. Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019

  18. Current CNN frameworks Gaps CPU GPU � 5 Ming Yang - RTAS 2019

  19. Current CNN frameworks Gaps CPU GPU Cycles not utilized � 5 Ming Yang - RTAS 2019

  20. Current CNN frameworks Gaps CPU GPU Cycles not utilized Single CNN underutilizes the hardware. � 5 Ming Yang - RTAS 2019

  21. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 � 6 Ming Yang - RTAS 2019

  22. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: � 6 Ming Yang - RTAS 2019

  23. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. � 6 Ming Yang - RTAS 2019

  24. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. � 6 Ming Yang - RTAS 2019

  25. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. 3. Fast synchronization between cameras becomes harder. � 6 Ming Yang - RTAS 2019

  26. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Parallelism through multi-process isn’t helping. Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. 3. Fast synchronization between cameras becomes harder. � 6 Ming Yang - RTAS 2019

  27. Proposed Solutions Part I: Parallel Execution for CNN frameworks Multi-camera Composite Images to provide Part II: high throughput for multiple cameras. � 7 Ming Yang - RTAS 2019

  28. Proposed Solutions Part I: Parallel Execution for CNN frameworks Multi-camera Composite Images to provide Part II: high throughput for multiple cameras. � 8 Ming Yang - RTAS 2019

  29. Let’s re-think the design of CNN frameworks Layer 1 Layer n Part I: Parallel Execution … Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  30. Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution … Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  31. Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution • Processing of images can be … independent , e.g., object detection. Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  32. Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution • Processing of images can be … independent , e.g., object detection. Part II: Multi- camera Composite Images We enable parallel execution for CNN frameworks and shared CNN for multiple cameras. � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  33. Stage 0 Stage 1 Stage N-1 … • Generalize concept of 퓁 퓀 r + e y 퓁 layers into stages a r L e y a L … � 10 Ming Yang - RTAS 2019

  34. Stage 0 Stage 1 Stage N-1 Queue 1 Queue N-1 Queue 0 1 3 2 … • Generalize concept of Bookkeeping Frames layers into stages data 0 0 • Communicate data 1 Data for 1 Frame 1 2 between stages using Data for 2 Frame 2 3 Data for PGM RT (a processing 3 Frame 3 graph management tool) � 11 Ming Yang - RTAS 2019

  35. Shared CNN Cameras Detection box results … Stage 0 Stage 1 Stage N-1 0 Queue 1 Queue N-1 Queue 0 1 3 2 … … C-1 • Generalize concept of Bookkeeping Frames layers into stages data 0 0 • Communicate data 1 Data for 1 Frame 1 2 between stages using Data for 2 Frame 2 3 Data for PGM RT (a processing 3 Frame 3 graph management tool) • Share CNN among multiple cameras � 12 Ming Yang - RTAS 2019

  36. Different Execution Methods Part I: Parallel Execution Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  37. Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  38. Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution P IPELINE shared CNN that has one thread per stage Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  39. Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution P IPELINE shared CNN that has one thread per stage Part II: Multi- camera Composite P ARALLEL shared CNN that has multiple threads per stage Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  40. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  41. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  42. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  43. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  44. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  45. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  46. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  47. CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  48. CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  49. CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

Recommend


More recommend