A SINGLE NEURAL NETWORK FORWARD PROPAGATION DETECTOR MINYOUNG KIM MINYOUNG.KIM@US.PANASONIC.COM PANASONIC SILICON VALLEY LABORATORY
PANASONIC SILICON VALLEY LABORATORY Location Cupertino Lab. Size ~ 100 people Team ~ 20 Deep Learning Robotics ADAS (Advanced Drivers Activities Assistant System ) Drones Collaboration with Universities in the world
OBJECT DETECTION WITH DEEP LEARNING Pros. Ø High performance Ø beat state-of-the-art records in many tasks including image classification and detection 1) Cons. Ø Large set of database Ø High computational power Ø Deep Neural Networks with millions of parameters Ø Slower running time than most of conventional algorithms
OBJECT DETECTION SYSTEM Run Recognizer Background Classification Pedestrian Proposal Generation Recognition Network Merge boxes Building Object Detection System Ø Training Deep Neural Network for Classification Ø Pedestrian detection: Binary classification Ø Object Proposal Generation at different scales Ø Generate box proposals (1000 ~ 2000 boxes) Ø Selective Search 2) ,Edge Boxes 3) , etc. Ø Merge largely overlapping boxes Ø Non Maximum Suppression
OBJECT DETECTION SYSTEM Proposal Generation & Scaling Ø Region proposal Ø Selective Search: 2 seconds per image (CPU) Ø order of magnitude slower Ø Edge Boxes: 0.2 seconds per image Ø Scaling Proposal Generation Ø multiple forward propagations Time Consuming! Ø Bottleneck Scaling Ø a forward propagation of an image Ø less than 0.1 seconds (GPU)
OUR PEDESTRIAN DETECTION SYSTEM Purpose Ø Speed up Ø remove proposal generation step to make the system faster Ø Speed is one of the most important element in ADAS (Advanced Driver Assistant Systems) applications (Practical Applications) Ø Build scale-invariant system Ø no need to process multiple scaled image to detect different size of pedestrians in an image PSVL Pedestrian Detection System INPUT OUTPUT PSVL Neural Detector A Single Forward Propagation
OUR PEDESTRIAN DETECTION SYSTEM Fully Convolutional Network as Detector Recognition Network Detection by a single forward propagation Add Regression Layer and Finetune
RECOGNITION NETWORK Train DNN for recognition Ø GPU Ø NVIDIA Titan X, NVIDIA Tesla K80 Ø Framework Ø Caffe 5) (Deep learning frame by the BVLC 6) ) Ø Network Architectures Ø Layers Ø Modified GoogLeNet 7) Ø 25~30 Convolutional layers Ø Input Ø Patches of Pedestrian and Backgrounds (80x32) Ø Output Ø Sigmoid or Softmax
RECOGNITION NETWORK Train DNN for recognition (Cont’d.) Ø Dataset – Caltech Pedestrian Detection Benchmark 4) Ø Approximately 10 hours of 640(w) x 480(h) 30Hz video Ø Regular traffic in an urban environment Ø About 250,000 frames with a total of 350,000 bounding boxes We choose … - 80(h) x 32(w) pixels - 0.4 Aspect Ratio Training Set - Mean Height: 64pixels - Mean Width: 24 pixels Testing Set - Mean Height: 52 pixels - Mean Width: 22 pixels
FULLY CONVOLUTIONAL NETWORK Convert recognition network to a fully convolutional network Base Network Fully connected Convolutional Kernel sliding limited input size Input size not limited
FULLY CONVOLUTIONAL NETWORK Regression Layer Ø Ground truth Data N X 4 Ø Nx4 box coordinates data Output Ø N: Feature Map resolution (N X x N Y ) N Y Feature Ø Original GT Box: B = [x 1 , y 1 , x 2 , y 2 ] Map Ø New GT Box: B’ = B / m Ø m : Multiplier of Window Size (120 x 120) 120 240 m = 2
FULLY CONVOLUTIONAL NETWORK Training detector network Ø Network Architectures Fully Box 640x480 Convolutional + Feature Coord- Original Network Map inates Images Regression Layer Ø Custom loss functions Ø Feature Map: Cross Entropy Loss with Boosting Ø Boosting Ø Ped: Correct Results (TPs) + Ground Truths (FNs) Ø True Positive if IOU > 0.5 Ø False Negative if Ground Truths not detected Ø NonPed: FPs Ø False Positive if IOU < 0.5 Ø Regression: Euclidean Loss with Feature Map Data incorporated
PERFORMANCE – VERY FAST WITH COMPETITIVE ACCURACY Performance of Pedestrian Detection Methods (Accuracy vs. Speed) Faster (*) Ø from DeepCascade paper 8) More accurate Ø DeepCascade: NVIDIA K20 Ø 15 fps Ø Ours: NVIDIA GTX770 (**) (***) Ø 34 fps Ø Speed Adjustment PSVL ND Ø 34*0.9699 9) = 33fps Ø Ours: NVIDIA Titan X Ø 51.422 fps w/o cuDNN Ø 85.565 fps with cuDNN4 (*): Left hand side for methods with unknown fps or less than 0.2 fps (**): DeepCascade without extra data (***): SpatialPooling+/Katamari methods use additional motion information
ND ON PORTABLE DEVICE Deploy PSVL ND on Google Nexus 9 Ø Hardware Specification Ø 8MP rear camera, 1.6MP front camera Ø Processor Ø NVIDIA Tegra K1 Ø GPU: NVIDIA Kepler with 192 CUDA cores Ø Software Ø Android application Ø Adjustable threshold bars Ø Probability and NMS Threshold Bar Ø Speed Ø base resolution (600x390): 5fps Ø lower resolution (280x240): 16fps
ND APPLICATION Toggle for Threshold Bar Detection box with Probability Probability and NMS Threshold Information Threshold Bar
ND APPLICATION DEMO (NEXUS 9)
ND APPLICATION DEMO (NEXUS 9)
ND APPLICATION DEMO (LAPTOP WITH GTX970M)
ND APPLICATION DEMO (CLUSTER WITH TITAN X)
SUMMARY & CONCLUSION PSVL Neural Detector supports … Ø end-to-end Pedestrian Detection with a single forward propagation of the neural network Ø very high speeds with competitive accuracy Ø capable to be run in real-time even when deployed in embedded systems PSVL Neural Detector can be used for … Ø extended system for Multiple-object detection on road conditions Ø Pedestrian, Car, Bus, Truck, Bicycle, Traffic Sign, etc Ø Scalable Ø with a bit of extra computational power needed
THANK YOU!
REFERENCES 1) Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1(4):541–551, Dec. 1989 2) J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, IJCV 2013 3) C. Lawrence Zitnick and Piotr Doll´ar, Microsoft Research 4) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ 5) http://caffe.berkeleyvision.org/ 6) Berkeley Vision and Learning Center (http://bvlc.eecs.berkeley.edu/) 7) C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015 8) A. Angelova, A. Krizhevsky V. Vanhoucke, A. Ogale, D. Ferguson. Real-Time Pedestrian Detection With Deep Network Cascades 9) http://caffe.berkeleyvision.org/performance_hardware.html
Recommend
More recommend