Advances in Visual Tracking Machine Learning Study Group Presented by Yaochen Xie Dec 7, 2017
Contents ❖ Visual Tracking Overview ❖ Dataset & Evaluation Methodology ❖ Traditional Approach (before 2010) ➢ Mean-Shift, Particle Filter, Optical Flow ❖ The State-of-the-Art (after 2010) ➢ Correlation Filter, Deep Learning ❖ A Summary: Generative models and Discriminative models
What is tracking in Computer Vision? ✴ Understanding geometric correspondences over time ✴ A fundamental problem in computer vision ✴ A challenging and difficult task ✴ Numerous applications
Applications Surveillance Autonomous Robots Motion Analysis Image Guided Surgery Biomedical Image Analysis Human Computer Interaction
Challenges Deformation Illumination variation Background Clutter Blur & Fast Motion
Challenges Out-of-plane rotation In-plane rotation Scale Variation Occlusion Out-of-view
Dataset OTB (Object Tracking Benchmark) http://cvlab.hanyang.ac.kr/tracker_benchmark/index.html The full benchmark contains 100 sequences from recent literatures. • The sequence names are in CamelCase without any blanks or underscores. • When there exist multiple targets each target is identified as dot+id_number (e.g. Jogging.1 and Jogging.2). • Each row in the ground-truth files represents the bounding box of the target in that frame, (x, y, box- width, box-height).
Dataset OTB (Object Tracking Benchmark) http://cvlab.hanyang.ac.kr/tracker_benchmark/index.html
Dataset VOT Challenge (Visual Object Tracking) http://www.votchallenge.net/ VOT 2015 • 60 short sequences • Chosen from a large pool of sequences including the ALOV dataset, OTB2 dataset, non- tracking datasets, etc. • Rotated bounding boxes in order to provide highly accurate ground truth values for comparing results
Evaluation ✴ Precision plot : center location error (average Euclidean distance between the center locations) / percentage within a threshold ✴ Success plot :
Evaluation ✴ Temporal Robustness Evaluation (TRE) ✴ Spatial Robustness Evaluation (SRE)
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Intuitive Description:
Traditional Approaches - Mean-shift Assumption: The data points are sampled from an underlying PDF Assumed Underlying PDF Real Data Samples
Traditional Approaches - Mean-shift Histogram and Back Projection Raw Image Histogram of ROI Back Projection (or other representations)
Traditional Approaches - Mean-shift Steps of tracking an object 1. select your Region of Interest in Frame t 0 2. calculate the Histogram of ROI 3. generate Back Projection of ROI in Frame t 1 4. iterate with Mean-Shift Or, introducing similarity function to select target candidate …
Traditional Approaches - Mean-shift Advantages : 1. Low computational complexity 2. Robust to partial occlusion, deformation, rotation and background movement Shortcomings: 1. Unable to deal with scale-variation 2. Low performance when object moves fast 3. Histogram is deficient in describing color features
Traditional Approaches - Particle Filtering Particle
Traditional Approaches - Particle Filtering Filter
Traditional Approaches - Particle Filtering Particle Filtering
Traditional Approaches - Particle Filtering • Initialization • Sampling • Decision • Resampling
Traditional Approaches - Particle Filtering Tracking
Traditional Approaches - Particle Filtering Tracking
Traditional Approaches - Particle Filtering Tracking
Traditional Approaches - Particle Filtering Tracking
Traditional Approaches - Particle Filtering Strengths : 1. Markov model reduces complexity of calculations 2. Good description of methods 3. Able to deal with scale-variation Weakness: 1. Low performance with occlusion 2. Histogram is deficient in describing color features
Traditional Approaches - Optical Flow Optical Flow: The pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene.
Traditional Approaches - Optical Flow Assumptions: • Constant luminance among frames • Minor movement • Each frame is sampled consecutively on temporal domain • Spatial consistency
The State-of-the-Art • MOSSE (Minimum Output Sum of Squared Error) • CSK (Circulant Structure of Tracking-by-detection) • CN (Adaptive Color Attributes) Correlation Filter based • GOTURN (Generic Object Tracking Using Regression Networks) • MDNet (Multi-Domain Convolutional Neural Networks) • TCNN (Modeling and Propagating CNNs in a Tree Structure) Deep ConvNet based
Correlation Filter based
GOTURN Generic Object Tracking Using Regression Networks Strengths • Offline Training • Generic Object Tracking • Avoid Online Fine-turning • Regression-based Approach
GOTURN Generic Object Tracking Using Regression Networks
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking • The width of a black arrow indicates the weight of a CNN for target state estimation while the width of a red edge denotes the affinity between two CNNs. • The width of box outline means the reliability of the CNN associated with the box.
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking • CNN Architecture
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking • Tree Construction • Tree structure: , where a vertex corresponds to a CNN, and a directed edge defines the relationship between CNNs. • The score of an edge is the affinity between two end vertices, which is given by
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking • Target state estimation • Candidates generate: sample from normalize distribution in (x, y, s) space, centered at target location in last frame • Target score: To define : • Select target:
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking • Bounding Box regression
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking • Update Model • Create node for new CNN associated with parent node: per 10 consecutive frames • The CNN in vertex z is fine-tuned from the CNN in using the training samples collected from two sets of frames, and .
TCNN Modeling and Propagating CNNs in a Tree Structure for Visual Tracking
Generative models and Discriminative models ✴ Generative models ✴ Discriminative models - Tracking-by-detection
Recommend
More recommend