Boosting Visual Object Tracking Using Deep Features and GPU Implementations Michael Felsberg Computer Vision Laboratory Department of Electrical Engineering Linköping University michael.felsberg@liu.se Martin Danelljan Fahad Khan
Definition of Visual Tasks Classification Task: Is there a dog in the image ? Detection Task: Where is a dog in the image ? Tracking Task: Where is the dog from the first frame in all subsequent frames of the image sequence ?
Visual Object Tracking (VOT) Use- cases: autonomous driving, surveillance, sports, … Problems: occlusion clutter changes in viewpoint scale illumination motion articulation
Why Visual Object Tracking? Why Generic? • cue for behavior • human-centered sensing • adaptation to environment • interaction • visualization • high inter-class variability • constantly new classes
VOT: Problem Definition • Input : • image sequence (video) • object bounding box in frame #1 • Output : • bounding boxes for frames t > 1 • determined from frames < t (causality) • Assumptions : • object is visible in all frames (at least partially) • camera might be moving (no background model) • Challenge : • build model from one annotated training sample of unknown object class • update model from previous estimates (bootstrapping process)
“Tracking is a solved problem” [Jianbo Shi & Carlo Tomasi CVPR 1994 ] • 23 years on: > 40 tracking papers per year • major conferences CVPR, ICCV, ECCV C-COT • major journals PAMI, IJCV, TIP • Several benchmarks in the last four years: • OTB [Wu et al., CVPR2013, PAMI2015] • VOT [Kristan et al., VOT-workshops at ICCV2013,15,17,ECCV2014,16] • ALOV+ [Smeulders et al., PAMI2014] • UAV123 [Mueller et al., ECCV2016] • Example: SRDCF (2015) vs. SOTA 2014
VOT Results DCF-Based Approaches VOT2016 VOT2015 VOT2014 ACT, DSST, SRDCF, DeepSRDCF, C-COT, CVPR2014 PAMI2016 ICCV2015 VOT2015 ECCV2016
DCF: Discriminative Correlation Filters Feature map Learned filters Output scores
Standard DCF formulation
Continuous approach: Overview Continuous filters Continuous Multi- output resolution features
Convolution Operator
Training Loss [Danelljan et al., CVPR 2016] [Danelljan et al., ICCV 2015]
Spatially Regularized DCF [Danelljan et al., ICCV 2015] Circular Convolution ⟺ Periodic Extension
Decontamination of Training Set [Danelljan et al., CVPR 2016]
Deep Learning Revolution SOTA 2012: Learning on ImageNet • ImageNet Large Scale Visual Recognition Challenge [Deng et al. 2009] • Today: • more than 14 million images • more than 10 million images annotated • more than 1 million images with bounding box • Classification error rate: Method Top-5 Error Method Description 2012 vs SuperVision (Toronto) 0.16422 CNN 2011 ISI (Tokyo) 0.26172 Hand-crafted features: SIFT, HOG and LBP OXFORD 0.26979 DPM + Hand-crafted features XRCE/INRIA 0.27058 Hand-crafted features
Deep Features for Object Tracking • deep features from imagenet-vgg-2048 network (five layers): shallow layers relevant [Danelljan & Häger et al., VOT 2015] • imagenet-vgg-very- deep-16 network (layers 4+13) • deep motion features (layer 5) [Gladh et al., ICPR 2016, best paper]
C-COT Results [Danelljan et al., ECCV 2016]
EAO-EFO trade-off?
ECO: Efficient Convolution Operators • Over-fitting and Complexity: Model Size High-dimensional features No. of parameters (800,000) in online learning. Scarcity of training data in tracking No. of parameters beyond dimensionality of input C-COT ECO Discriminatively learn a lower-dimensional feature space by jointly minimizing the classification error. 80% reduction in the number of modell parameters [Danelljan et al., CVPR 2017]
ECO: Efficient Convolution Operators • Over-fitting and Complexity: Training Set Size Large training sample set Significant computational burden. Memory size is limited due to large feature set Discarding old samples lead to over-fitting to recent appearance C-COT ECO Model the training data as a mixture of Gaussian components. Compact and diverse representation of training data [Danelljan et al., CVPR 2017]
ECO: Efficient Convolution Operators • UAV dataset: 123 aerial videos with 110K frames. HC features Accuracy Speed C-COT 50.8 < 10 FPS ECO 52.9 60 FPS partial occlusion (the guitar) deformations out-of-plane rotations [Danelljan et al., CVPR 2017]
ECO: Efficient Convolution Operators • VOT2016 dataset: 13.3% relative gain in performance [Danelljan et al., CVPR 2017]
Computational Considerations • Matlab implementation using Matconvnet • CPU-measurements (ECO-HC) on 4-core i7-6700 @ 3.4 GHz • GPU-measurements (ECO) on a Tesla K40 GPU (donated by NVidia) • Fine-tuning of networks on Kebnekaise (Umeå) • 32 nodes with 2 K80 cards (4992 cores each) • 4 nodes with 4 K80 cards • Intel Xeon E5-2690v4 (14 cores), 128 GB • Each batch • 80 videos • 16 frames each • processed on 4 GPUs
System Implementation
Acknowledgements • Wallenberg Autonomous Systems and Software Program (WASP) • Swedish Research Council (EMC2, ELLIIT) • SSF (CUAS, SymbiCloud) • NVidia • CVL ( Martin , Fahad, Gustav, Andreas, Susanna, Goutam)
References (selection) “Adaptive Color Attributes for Real - Time Visual Tracking” (CVPR2014) “Learning Spatially Regularized Correlation Filters for Visual Tracking” (ICCV2015, 1 st rank UAV123) “Convolutional Features for Correlation Filter Based Visual Tracking” (ICCVWS - VOT2015, 2 nd rank VOT2015) “Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking” (CVPR2016) “Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking” (ECCV2016, 1 st rank VOT2016) “Deep Motion Features for Visual Tracking” (ICPR2016, best paper award) “Discriminative Scale Space Tracking” (IEEE TPAMI2016 , 1 st rank VOT2014, 1 st rank OpenCV challenge) “ECO: Efficient Convolution Operators for Tracking” (CVPR2017, accepted)
CAIP 2017 17th international Conference on Computer Analysis of Images and Patterns Ystad - Sweden, 22-24 Aug General Chair Michael Felsberg Program Chairs Anders Heyden Norbert Krüger Industrial Liaison Zhibo Pang Paper submission 3 Apr, 2017 The conference invites novel contributions to the Author notification 26 May, 2017 automatic analysis of images and patterns, encompassing both Camera-ready paper 31 May, 2017 new challenging application areas and substantial new theoretical developments in the field. Early registration 16 Jun, 2017 2D-to-3D 3D Vision Biomedical image and pattern analysis Biometrics Brain-inspired methods Document analysis Main conference 22-24 Aug, 2017 Face and gestures Feature extraction Graph-based methods Invited speakers Alan Bovik High-dimensional topology methods Human pose estimation Image/video indexing & retrieval Image restoration Keypoint detection Machine learning for image and pattern analysis Markus Vincze Mobile multimedia Model-based vision Motion and tracking Christian Igel Object recognition Segmentation Shape representation and analysis REACTS Workshop George Azzopardi Static and dynamic scene analysis Statistical models Surveillance Pose estimation tutorial Anders G. Buch Vision for robotics
Questions? • michael.felsberg@liu.se • http://users.isy.liu.se/cvl/mfe/ • https://liu.se/en/employee/micfe03 • http://www.cvl.isy.liu.se/ • https://liu.se/en/organisation/liu/isy/cvl
Recommend
More recommend