gpu b ased d eep
play

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F - PowerPoint PPT Presentation

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016 Confidential and Proprietary Nauto is launching a connected camera for professional drivers Drive more than most consumers Exposed to


  1. GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016 Confidential and Proprietary

  2. Nauto is launching a connected camera for professional drivers • Drive more than most consumers • Exposed to passenger and driver liability • Driver quality unknown - small number of very bad drivers 2

  3. Massive shift in transportation due to synergistic technologies Autonomous Connected 90% reduction Fleet in accidents optimization $0.08 / mile Shared Electric 50-70% 85% efficient utilization drivetrain 3

  4. Why use deep learning? Good at visual tasks Scalable Most important for NAUTO Deployable 4

  5. Small brains have a lot of functionality 1 million 1mW 10 million 10mW 26 billion neurons 20 watts 100 million 100mW 5

  6. Required performance depends on use case 6

  7. Small changes in F1 with size • Order of magnitude improvements in speed with basic exploration • Always worth measuring performance/size tradeoff • Large networks can be used in later stages of cascade 7

  8. Test your chipsets - algorithm speed important but not entire story 150 Nauto CNN forward pass (msec) 120 • Chipsets released in 2014, 2015 and 2016 90 • Pricing varying from $25 to $60+ 60 • Varying degrees of 30 HW/SW support 0 A B C D E Embedded SoC 8

  9. Algorithm is not the bottleneck Image Conversion to CNN forward Other steps processing CNN space pass … msec 15msec 30msec 30msec 9

  10. Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months 10

  11. Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months Post-GPU months/years months weeks months 11

  12. Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months Post-GPU months/years months weeks months Nauto weeks weeks days weeks prototype 12

  13. Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months Post-GPU months/years months weeks months Nauto weeks weeks days weeks prototype Nauto at- ? ? ? ? scale 13

  14. Easy to think of optimization; hard to think of system Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil . Yet we should not pass up our opportunities in that critical 3%. Donald Knuth 14

  15. Lessons • Match algorithm performance to use case • Embedded pipeline as important as raw CNN performance • Overall system performance (data acquisition, labeling, training) is where big progress to be made 15

  16. The future is in distributed awareness Real world search 16

  17. Team Ludmila Levkova Nikhil Deshmukh Joe Virzi Jonathan Soo 17

Recommend


More recommend