Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim minyoung.kim@us.panasonic.com Panasonic Silicon Valley Lab
TOWARDS AUTONOMOUS DRIVING VISUAL PERCEPTION Object Object Detection Tracking APPLICATIONS Risk Safety Prediction Control 2
MULTI-OBJECT TRACKING PROPOSAL ISSUE • Enhanced Siamese Network • Large number of hyper parameters • Appearance + Temporal Info. • High complexity (Low speed) • Efficient Matching Algorithm à feasibility as real-world product ê ESNN-based Similarity Mapping Matching Algorithm Frame t - 1 Frame t - 1 Frame t 2 2 1 1 MATCH Frame t - 1 PAIRS PARAMETERS 0.033 25.328 Frame t 23.242 0.0223 Frame t ( , ) Feature Map 2 Euclidean Distance 3 1
SIMILARITY MAPPING R. Hadsell et al (2005) Base Network Architecture ⎛ N ⎞ L c = 1 2 + (1 − y n )max( m − E n ,0) 2 ∑ ( y n ) E n ⎜ ⎟ 2 N ⎝ ⎠ n = 1 Base Network N B tanh1 p tanh2 p tanh3 p tanh4 p tanh5 p relu p conv1 p conv2 p conv3 p conv4 p conv5 p pool1 p pool2 p data p feat p fc1 p fc2 p Contrastive Loss pair data P conv1 conv2 conv3 conv4 conv5 pool1 pool2 data feat fc1 fc2 ( : weight sharing ) tanh1 tanh2 tanh3 tanh4 tanh5 relu Siamese Network 4
SIMILARITY MAPPING Dataset Ø Market-1501 * Ø 1501 identities, 32668 bounding boxes, 6 camera views Ø MOT16 ** Ø 7 training, 7 testing video sequences Ø split training sets into train/val * L. Zheng at al (2015) ** A. Milan at al (2016) 5
SIMILARITY MAPPING Similarity with N B Margin Precision 0.9145 Recall 0.9966 F-score 0.9538 : Non-matching pairs : Matching pairs : margin non-matching matching Data Pairs 6
SIMILARITY MAPPING Temporal Information Enhanced Architecture Ø Intersection over Union Ø Area Variant Ratio Enhanced Siamese Neural Network (ESNN) D IoU D Arat deconv I deconv A relu A relu I ( : weight sharing ) Frame t data p feat p concat p Contrastive Loss pair data P N B data feat concat Frame t + k Base Network N B ⎡ ⎤ https://motchallenge.net/vis/PETS09-S2L1/gt/ ] ( b i , b j ) = area ( b i ∩ b j ) area ( b i ∪ b j ), min( area ( b i ), area ( b j )) [ D IoU , D arat ⎢ ⎥ max( area ( b i ), area ( b j )) 7 ⎢ ⎥ ⎣ ⎦
SIMILARITY MAPPING Similarity with N B ( left ) and ESNN ( right ) IS IoU ENOUGH? (on a sample sequence from MOT16) ( x-axis : IoU , y-axis : Euclidean Distance) Precision é : Non-matching pairs : Matching pairs : margin 8
SIMILARITY MAPPING Similarity on MOT16 N B ESNN Precision 0.8187 0.9964 Recall 0.9529 0.9931 F-score 0.8807 0.9947 ( x-axis : IoU , y-axis : Euclidean Distance) : Non-matching pairs : Matching pairs : margin 9
MATCHING ALGORITHM A Simple Matching Algorithm Ø Heuristic Ø Computationally efficient Ø two-step greedy algorithm Ø Hungarian Algorithm (Kuhn, H. W. (1955)) Ø e.g. Time MOT16-05 1.03x é (Not crowded) MOT16-04 2.69x é (Crowded) Ø Better MOTA Ours Hungarian Complexity (# of objects) O(n 2 ) O(n 3 ) MOTA 35.3 27.7 (Multi-Object Tracking Accuracy) 10
EVALUATION Ø Online Method Ø “solution available immediately with each incoming frame and cannot be changed at any later time” * Ø Fast ** Speed: 2.68~45.10 fps *** on each video sequence **** ( MOTA: Multi-Object Tracking Accuracy ) * Choi, W. (2015) ** Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015) *** Stiller, C., Urtasun, R., Wojek, C., Lauer, M., Geiger, A. (2014) **** Milan, A., Roth, S., Schindler, K. (2014) Ø Later Online methods MOTA Hz MDPNN16 (A. Sadeghian et al, 2017) 47.2 1.0 CDA_DDAL (S. Bae et al, 2017) 43.9 0.5 EAMTT (R. Sanchez-Matilla et al, 2016) 38.8 11.8 OVBT (Y. Ban et al, 2016) 38.4 0.3 11
MODEL COMPRESSION Ø Inspired by SqueezeNet (arXiv: 1602.07360) Ø Tested on NVIDIA GTX 1080 ORIGINAL SQN ESNN SUB SET MOTA FPS MOTA FPS MOT16-02 17.1 21.27 15.7 26.59 MOT16-04 34.7 7.20 34.0 9.02 MOT16-05 31.0 42.01 29.6 60.55 MOT16-09 48.4 16.51 45.6 25.65 MOT16-10 31.4 23.60 31.1 28.42 MOT16-11 48.2 18.99 47.5 24.96 MOT16-13 6.8 38.04 6.6 47.02 TOTAL 30.2 16.58 29.3 21.41 FPS: 20~55% é (avg. 29%) MOTA: 0.9~8.2% ê (avg. 3%) Memory Usage: 70+% down (1+GB à 350+MB) Model Size: 90+% down (100+MB à 3.6MB) 12 https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080/
MULTI-OBJECT DETECTION & TRACKING Object & Object Detection Tracking 13
PSVL MOD https://www.nvidia.com/en-us/geforce/ products/10series/geforce-gtx-1080/ 14
PSVL MOD + MOT 15
SIMILARITY MAPPING COMPARISON AGAIN N B Car #5 ESNN Car #5 16
SIMILARITY MAPPING COMPARISON AGAIN N B 17
SIMILARITY MAPPING COMPARISON AGAIN ESNN 18
LIMITATION Ø Speed Ø dependent on # of objects TDB(F k , O i ) Ø Hyper Parameter TDB(F k+j , O i ) Ø Lifetime of Tracklet Ø # of frames for keeping each tracklet data TDB(F k+2j , O i ) Ø the longer kept, the higher chance to be recovered when occluded Ø more ID switches with short lifetime TDB(F k+3j , O i ) for how long? 19
CONCLUSION VISUAL PERCEPTION Object Object Detection Tracking Unsupervised Learning APPLICATIONS Risk Safety Prediction Control 20
Thank you! Panasonic Silicon Valley Lab
Recommend
More recommend