Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1
Outline • Siamese Architecture • Siamese Applications in Computer Vision • Paper review Visual Object Tracking using Siamese CNN • Future Work 2
What does “Siamese” mean? 3 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Architecture 4 Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf
Siamese Architecture and loss function 5 Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf
Siamese Applications in Computer Vision: 1. Signature Verification 6 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 2. Dimensionality Reduction 7 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 3.1 Learning Image Descriptors CNN Model 8 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 3.2 Learning Image Descriptors 9 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 4.1 Face Verification 10 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 4.2 Face Verification 11 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 4.3 Face Verification 12 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 4.4 Face Verification 13 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Siamese Applications in Computer Vision: 4.5 Face Verification 14 Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf
Paper Review: Fully-Convolutional Siamese Networks for Object Tracking @article{bertinetto2016fully, title={Fully-Convolutional Siamese Networks for Object Tracking}, author={Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS}, journal={arXiv preprint arXiv:1606.09549}, year={2016} } 15
Architecture of Siamese CNN Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 16 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Details of the Architecture of Siamese CNN 1. Source: 1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012. 17
Details of the Architecture of Siamese CNN 1. 2. Cross-correlation layer Source: 1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012. 2: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 18 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Training: dataset • ImageNet Video dataset of 2015: contains ~4000 videos with ~1 million annotated frames Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 19 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Training: preprocessing on the images • Preprocessing: 2820 videos, examplar image: 127 x 127, search image: 255 x 255 Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 20 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Training: recap the steps • ImageNet Video dataset of 2015: contains ~4000 videos with ~1 million annotated frames • Preprocessing: 2820 videos examplar image: 127 x 127 search image: 255 x 255 • Training with a standard Stochastic Gradient Descent (SGD) solver using MathConvNet Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 21 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Training: loss function • Employing a discriminative training approach using positive and negative pairs and adopting the logistic loss: Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 22 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Training: loss function • Employing a discriminative training approach using positive and negative pairs and adopting the logistic loss: • The loss of a score map is the mean of the individual losses: Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 23 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Training: loss function • Employing a discriminative training approach using positive and negative pairs and adopting the logistic loss: • The loss of a score map is the mean of the individual losses: • Applying SGD to find the conv-net Ѳ using Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 24 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Tracking algorithm • Use a search image centered at the previous position of the target. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 25 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Tracking algorithm • Use a search image centered at the previous position of the target. • Only search for the object within a region of approximately four times its previous size. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 26 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Tracking algorithm • Use a search image centered at the previous position of the target. • Only search for the object within a region of approximately four times its previous size. • A cosine window is added to the score map to penalize large displacements. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 27 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Tracking algorithm • Use a search image centered at the previous position of the target. • Only search for the object within a region of approximately four times its previous size. • A cosine window is added to the score map to penalize large displacements. • The position of the maximum score relative to the center of the score map, multiplied by the stride of the network, gives the displacement of the target from frame to frame. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 28 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Experiments: training dataset size • Accuracy: is calculated as the average Intersection-over-Union (IoU) • Robustness: in terms of the total number of failures Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 29 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Experiments: training dataset size • Accuracy: is calculated as the average Intersection- over-Union (IoU) • Robustness: in terms of the total number of failures • Using a larger video dataset could increase the performance even further. Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 30 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Experiments: OTB13 benchmark results Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 31 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Experiments: VOT15 benchmark results Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 32 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Experiments: VOT15 benchmark results Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, 33 fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.
Recommend
More recommend