applications in visual object tracking
play

Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 - PowerPoint PPT Presentation

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 Outline Siamese Architecture Siamese Applications in Computer Vision Paper review Visual Object Tracking using Siamese


  1. Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1

  2. Outline • Siamese Architecture • Siamese Applications in Computer Vision • Paper review  Visual Object Tracking using Siamese CNN • Future Work 2

  3. What does “Siamese” mean? 3 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  4. Siamese Architecture 4 Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf

  5. Siamese Architecture and loss function 5 Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf

  6. Siamese Applications in Computer Vision: 1. Signature Verification 6 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  7. Siamese Applications in Computer Vision: 2. Dimensionality Reduction 7 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  8. Siamese Applications in Computer Vision: 3.1 Learning Image Descriptors CNN Model 8 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  9. Siamese Applications in Computer Vision: 3.2 Learning Image Descriptors 9 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  10. Siamese Applications in Computer Vision: 4.1 Face Verification 10 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  11. Siamese Applications in Computer Vision: 4.2 Face Verification 11 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  12. Siamese Applications in Computer Vision: 4.3 Face Verification 12 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  13. Siamese Applications in Computer Vision: 4.4 Face Verification 13 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  14. Siamese Applications in Computer Vision: 4.5 Face Verification 14 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

  15. Paper Review: Fully-Convolutional Siamese Networks for Object Tracking @article{bertinetto2016fully, title={Fully-Convolutional Siamese Networks for Object Tracking}, author={Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS}, journal={arXiv preprint arXiv:1606.09549}, year={2016} } 15

  16. Architecture of Siamese CNN Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 16 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  17. Details of the Architecture of Siamese CNN 1. Source: 1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012. 17

  18. Details of the Architecture of Siamese CNN 1. 2. Cross-correlation layer Source: 1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012. 2: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 18 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  19. Training: dataset • ImageNet Video dataset of 2015:  contains ~4000 videos  with ~1 million annotated frames Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 19 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  20. Training: preprocessing on the images • Preprocessing: 2820 videos, examplar image: 127 x 127, search image: 255 x 255 Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 20 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  21. Training: recap the steps • ImageNet Video dataset of 2015:  contains ~4000 videos  with ~1 million annotated frames • Preprocessing:  2820 videos  examplar image: 127 x 127  search image: 255 x 255 • Training with a standard Stochastic Gradient Descent (SGD) solver using MathConvNet Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 21 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  22. Training: loss function • Employing a discriminative training approach using positive and negative pairs and adopting the logistic loss: Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 22 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  23. Training: loss function • Employing a discriminative training approach using positive and negative pairs and adopting the logistic loss: • The loss of a score map is the mean of the individual losses: Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 23 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  24. Training: loss function • Employing a discriminative training approach using positive and negative pairs and adopting the logistic loss: • The loss of a score map is the mean of the individual losses: • Applying SGD to find the conv-net Ѳ using Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 24 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  25. Tracking algorithm • Use a search image centered at the previous position of the target. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 25 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  26. Tracking algorithm • Use a search image centered at the previous position of the target. • Only search for the object within a region of approximately four times its previous size. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 26 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  27. Tracking algorithm • Use a search image centered at the previous position of the target. • Only search for the object within a region of approximately four times its previous size. • A cosine window is added to the score map to penalize large displacements. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 27 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  28. Tracking algorithm • Use a search image centered at the previous position of the target. • Only search for the object within a region of approximately four times its previous size. • A cosine window is added to the score map to penalize large displacements. • The position of the maximum score relative to the center of the score map, multiplied by the stride of the network, gives the displacement of the target from frame to frame. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 28 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  29. Experiments: training dataset size • Accuracy: is calculated as the average Intersection-over-Union (IoU) • Robustness: in terms of the total number of failures Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 29 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  30. Experiments: training dataset size • Accuracy: is calculated as the average Intersection- over-Union (IoU) • Robustness: in terms of the total number of failures • Using a larger video dataset could increase the performance even further. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 30 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  31. Experiments: OTB13 benchmark results Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 31 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  32. Experiments: VOT15 benchmark results Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 32 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  33. Experiments: VOT15 benchmark results Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 33 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

Recommend


More recommend