GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016 Confidential and Proprietary
Nauto is launching a connected camera for professional drivers • Drive more than most consumers • Exposed to passenger and driver liability • Driver quality unknown - small number of very bad drivers 2
Massive shift in transportation due to synergistic technologies Autonomous Connected 90% reduction Fleet in accidents optimization $0.08 / mile Shared Electric 50-70% 85% efficient utilization drivetrain 3
Why use deep learning? Good at visual tasks Scalable Most important for NAUTO Deployable 4
Small brains have a lot of functionality 1 million 1mW 10 million 10mW 26 billion neurons 20 watts 100 million 100mW 5
Required performance depends on use case 6
Small changes in F1 with size • Order of magnitude improvements in speed with basic exploration • Always worth measuring performance/size tradeoff • Large networks can be used in later stages of cascade 7
Test your chipsets - algorithm speed important but not entire story 150 Nauto CNN forward pass (msec) 120 • Chipsets released in 2014, 2015 and 2016 90 • Pricing varying from $25 to $60+ 60 • Varying degrees of 30 HW/SW support 0 A B C D E Embedded SoC 8
Algorithm is not the bottleneck Image Conversion to CNN forward Other steps processing CNN space pass … msec 15msec 30msec 30msec 9
Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months 10
Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months Post-GPU months/years months weeks months 11
Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months Post-GPU months/years months weeks months Nauto weeks weeks days weeks prototype 12
Entire system must be optimized Collect data Train Label Deploy Pre-GPU months/years months years months Post-GPU months/years months weeks months Nauto weeks weeks days weeks prototype Nauto at- ? ? ? ? scale 13
Easy to think of optimization; hard to think of system Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil . Yet we should not pass up our opportunities in that critical 3%. Donald Knuth 14
Lessons • Match algorithm performance to use case • Embedded pipeline as important as raw CNN performance • Overall system performance (data acquisition, labeling, training) is where big progress to be made 15
The future is in distributed awareness Real world search 16
Team Ludmila Levkova Nikhil Deshmukh Joe Virzi Jonathan Soo 17
Recommend
More recommend