Real - Time Face Recognition on Jetson Tx2 using TensorRT Tamas Grobler 11 . 10 . 2017 GTC
Table of Contents The The Problem Results Conclusion solution Real Time Video Recognition Testing Method Real Time Analysis Framework Recognition Considerations Face Detection Training Method results Video Face Recognition Inference time comparison Real - Time Face Recognition 01
Face Detection Real - Time Face Recognition 02
Face Recognition WHO ? Real - Time Face Recognition 03
Recognition Pipeline Real - Time Face Recognition 04
Head detection Region - based CNN methodology Backbone : PVANet with Training : HyperNet - inspired hyper - features Ultinous film head corpus + Region Proposal Network subset of HollywoodHeads from Faster R - CNN dataset Network Scales with Image Backbone pre - trained on Size ImageNet Non Maximum Suppression Hard Negative Mining Real - Time Face Recognition 05
Base Model : GoogLeNet Face Multiple crops of each image ( 160x160 -> 144x144 ) Recognition Frontalization using Spatial Transformer Network ( 144x144 -> 128x128 ) Training MsCeleb dataset : ~ 10M images from ~ 100k classes Augmentation : random crops , mirroring Trained for both Classification and Discrimination ( Triplet Loss ) with Hard Negative Mining Real - Time Face Recognition 06
Results : Recognition Test Data : NIST IJB - A Dataset Average over 10 splits 1000 positive / 10000 negative pairs per split Real - Time Face Recognition 07
Measurement Setup NVIDIA Titan Xp NVIDIA Jetson TX2 NVIDIA Pascal architecture NVIDIA Pascal architecture 3840 NVIDIA CUDA Cores 256 NVIDIA CUDA Cores 12 GB memory ; 547 . 7 GB / s 8 GB memory ; 58 . 3 GB / s Caffe NVIDIA TensorRT NVIDIA CUDA 8 3 . 0 Release Candidate NVIDIA cuDNN 7 32b , 16b , 8b arithmetic Real - Time Face Recognition 08
Inference Time Comparison : Caffe vs . TensorRT Real - Time Face Recognition 09
Inference Time Comparison : Titan Xp vs . Jetson TX2 Real - Time Face Recognition 10
Conclusion Real - Time Considerations for Jetson TX2 with TensorRT ( 16 - bit arithmetic ) Frame rate : 10 fps -> 100 ms for detection + recognition Head Detection time per 1536x864 frame ( speculative for 16 bit ): < 30 ms Face recognition can handle 50 images in ~ 66 ms 5 crops per face This allows for 10 simultaneous recognitions per frame ( 30 ms + 66 ms ) Future work : test TensorRT with Int8 arithmetic for both accuracy and inference time Real - Time Face Recognition 11
Real - Time Face Recognition 12
Contributors tamas . grobler @ ultinous . com ultinous . com György Balogh János Locki József Németh Attila Szabó Contact Real - Time Face Recognition 13
[ 1 ] arXiv : 1611 . 08588 PVANet : Lightweight Deep Neural Networks for Real - time Object Detection Sanghoon Hong , Byungseok Roh , Kye - Hyeon Kim , Yeongjae Cheon , Minje Park [ 2 ] arXiv : 1604 . 00600 HyperNet : Towards Accurate Region Proposal Generation and Joint Object Detection Tao Kong , Anbang Yao , Yurong Chen , Fuchun Sun References [ 3 ] arXiv : 1506 . 01497 Faster R - CNN : Towards Real - Time Object Detection with Region Proposal Networks Shaoqing Ren , Kaiming He , Ross Girshick , Jian Sun [ 4 ] arXiv : 1409 . 4842 Going deeper with convolutions Christian Szegedy , Wei Liu , Yangqing Jia , Pierre Sermanet , Scott Reed , Dragomir Anguelov , Dumitru Erha , Vincent Vanhoucke , Andrew Rabinovich [ 5 ] arXiv : 1701 . 07174 Towards End - to - End Face Recognition through Alignment Learning Yuanyi Zhong , Jiansheng Chen , Bo Huang [ 6 ] “ This product contains or makes use of the following data made available by the Intelligence Advanced Research Projects Activity ( IARPA ): IARPA Janus Benchmark A ( IJB - A ) data detailed at Face Challenges homepage ." [ 7 ] arXiv : 1408 . 5093 Caffe : Convolutional Architecture for Fast Feature Embedding Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , Trevor Darrell Real - Time Face Recognition 14
Recommend
More recommend