Beyond Detection: Towards Multi-Object Tracking and Segmentation - PowerPoint PPT Presentation

Beyond Detection: Towards Multi-Object Tracking and Segmentation Andreas Geiger Autonomous Vision Group University of T¨ ubingen / MPI for Intelligent Systems June 17, 2018 University of Tübingen MPI for Intelligent Systems Autonomous Vision Group

MOTS: Multi-Object Tracking and Segmentation [Voigtlaender, Krause, Osep, Luiten, Sekar, Geiger & Leibe, CVPR 2019]

Motivation ◮ Datasets for multi-object tracking ◮ MOTChallenges ◮ MOT15 [Leal-Taixe et al., 2015] ◮ MOT16, MOT17 [Milan et al., 2016] ◮ CVPR19 [Dendorfer et al., 2019] ◮ KITTI Tracking [Geiger et al., 2012] ◮ VisDrone2018 [Zhu et al., 2018] ◮ DukeMTMC [Ristani et al., 2016] ◮ UA-DETRAC [Wen et al., 2015] ◮ ... 3

Motivation ◮ Datasets for multi-object tracking ◮ MOTChallenges ◮ MOT15 [Leal-Taixe et al., 2015] ◮ MOT16, MOT17 [Milan et al., 2016] ◮ CVPR19 [Dendorfer et al., 2019] ◮ KITTI Tracking [Geiger et al., 2012] ◮ VisDrone2018 [Zhu et al., 2018] ◮ DukeMTMC [Ristani et al., 2016] ◮ UA-DETRAC [Wen et al., 2015] ◮ ... ◮ Led to great progress in the community 3

Motivation ◮ Datasets for multi-object tracking ◮ MOTChallenges ◮ MOT15 [Leal-Taixe et al., 2015] ◮ MOT16, MOT17 [Milan et al., 2016] ◮ CVPR19 [Dendorfer et al., 2019] ◮ KITTI Tracking [Geiger et al., 2012] ◮ VisDrone2018 [Zhu et al., 2018] ◮ DukeMTMC [Ristani et al., 2016] ◮ UA-DETRAC [Wen et al., 2015] ◮ ... ◮ Led to great progress in the community ◮ But annotations are only on the bounding box level 3

Are bounding boxes enough?

Object Tracking vs. Segmentation ◮ In difficult cases, bounding boxes are a very coarse approximation ◮ Most pixels of the bounding box belong to other objects 5

Two Communities Object Tracking Semantic Segmentation / Instance Segmentation 6

Can we unite the two?

MOTS: Multi-Object Tracking and Segmentation ◮ Dense pixel-wise annotations are tedious, hard work .. but we did it! KITTI MOTS 8

MOTS: Multi-Object Tracking and Segmentation ◮ Dense pixel-wise annotations are tedious, hard work .. but we did it! MOTSChallenge 8

MOTS: Multi-Object Tracking and Segmentation ◮ How? 4 student assistants & semi-automatic annotation procedure KITTI MOTS MOTSChallenge train val train # Sequences 12 9 4 # Frames 5,027 2,981 2,862 # Tracks Pedestrian 99 68 228 # Masks Pedestrian (total) 8,073 3,347 26,894 # Masks Pedestrian (annot.) 1,312 647 3,930 # Tracks Car 431 151 - # Masks Car (total) 18,831 8,068 - # Masks Car (annot.) 1,509 593 - 9

Data Annotation

Data Annotation ◮ Starting point: existing box level tracking annotations ◮ Fully convolutional network converts bounding boxes to segmentation masks 11

Data Annotation ◮ Starting point: existing box level tracking annotations ◮ Fully convolutional network converts bounding boxes to segmentation masks ◮ First, 2 instances per track are manually annotated 11

Data Annotation ◮ Starting point: existing box level tracking annotations ◮ Fully convolutional network converts bounding boxes to segmentation masks ◮ First, 2 instances per track are manually annotated ◮ However, the trained segmentation model will not be perfect 11

Data Annotation ◮ Starting point: existing box level tracking annotations ◮ Fully convolutional network converts bounding boxes to segmentation masks ◮ First, 2 instances per track are manually annotated ◮ However, the trained segmentation model will not be perfect ◮ Repeat until annotations are good: 1. Annotators fix worst errors with polygon annotations 2. Add new annotations to training set of FCN 3. Re-train FCN (pre-train on all, fine-tune per object) ⇒ Allows for adaptation to appearance and context of each object 4. Re-generate masks using FCN 11

Data Annotation ◮ Manual corrections ensure consistency and high quality 12

Data Annotation ◮ Manual corrections ensure consistency and high quality ◮ Large savings in annotation time ◮ KITTI MOTS: only 13% of car boxes / 17% of pedestrian boxes manually annotated ◮ MOTSChallenge: 15% of pedestrian boxes manually annotated 12

Evaluation Metrics

Evaluation Metrics ◮ We consider mask-based variants of the CLEAR MOT metrics [Bernardin and Stiefelhagen, 2008] 14

Evaluation Metrics ◮ We consider mask-based variants of the CLEAR MOT metrics [Bernardin and Stiefelhagen, 2008] ◮ Need to associate predictions to ground truth instances ◮ Box-based tracking: boxes might overlap ◮ Requires bi-partite matching 14

Evaluation Metrics ◮ We consider mask-based variants of the CLEAR MOT metrics [Bernardin and Stiefelhagen, 2008] ◮ Need to associate predictions to ground truth instances ◮ Box-based tracking: boxes might overlap ◮ Requires bi-partite matching ◮ Mask-based tracking: masks are disjoint ◮ Establishing correspondences is greatly simplified ◮ Hypothesized and ground truth masks are matched iff mask IoU > 0 . 5 14

Evaluation Metrics (Soft) Multi-Object Tracking and Segmentation Accuracy / Precision: MOTSA = 1 − | FN | + | FP | + | IDS | = | TP | − | FP | − | IDS | | M | | M | � � � TP TP − | FP | − | IDS | � MOTSP = sMOTSA = TP = IoU ( h, c ( h )) | TP | | M | h ∈ TP ◮ c : mapping from hypotheses to ground truth ◮ TP: true positives, � TP: soft number of true positives ◮ FN: false negatives, FP: false positives, IDS: ID switches ◮ M: set of ground truth segmentation masks 15

TrackR-CNN Baseline

TrackR-CNN ... During Image Training Features Image Instance Loss Segmentation Bounding Box ... Feature t-1 Regression Ground Truth Extraction Temporally Enhanced Shared Image weights Classification Features CAR: 0.99 Loss Video Tracking CAR: 0.99 CAR: 0.99 + Scoring CAR: 0.99 CAR: 0.99 Ground Truth Region Feature t Proposal Extraction 2x Network During 3D Conv Evaluation Online Track Association Shared Mask weights Generation Previously ... Feature t+1 Tracked Extraction Objects Association Embedding ... 128-D Association Vectors Key Idea: ◮ Detection, segmentation, and data association with a single ConvNet ◮ Extend Mask R-CNN by 3D convolutions and association head 17

TrackR-CNN Association Head: ◮ Predict association vector for each detection 18

TrackR-CNN Association Head: ◮ Predict association vector for each detection ◮ Detections of same instance should be close in embedding space 18

TrackR-CNN Association Head: ◮ Predict association vector for each detection ◮ Detections of same instance should be close in embedding space ◮ Detections of distinct instances should be distant from each other 18

TrackR-CNN Training: ◮ Learned using batch-hard triplet loss [Hermans et al., 2017]: � � � 1 max max � a e − a d � 2 − min � a e − a d � 2 + α, 0 | D | e ∈D : e ∈D : d ∈D id e = id d id e � = id d ◮ Mini-batch: 8 consecutive frames ◮ Mine furthest detection of same instance and closest detection of other instance ◮ Require separation by not more than margin α 19

TrackR-CNN Training: ◮ Learned using batch-hard triplet loss [Hermans et al., 2017]: � � � 1 max max � a e − a d � 2 − min � a e − a d � 2 + α, 0 | D | e ∈D : e ∈D : d ∈D id e = id d id e � = id d ◮ Mini-batch: 8 consecutive frames ◮ Mine furthest detection of same instance and closest detection of other instance ◮ Require separation by not more than margin α Inference: ◮ Associate detections over time based on Euclidean distance in embedding space and bi-partite graph matching 19

Experimental Evaluation

Results of TrackR-CNN on MOTSChallenge ◮ Crowded scenes can lead to missing detections and id switches 21

Results of TrackR-CNN on KITTI MOTS ◮ Most objects distinguished well but some erroneous detections remain (red) 22

Results of TrackR-CNN on KITTI MOTS ◮ Continuation of track with same ID after missing detection (red) 23

Beyond Detection: Towards Multi-Object Tracking and Segmentation - PowerPoint PPT Presentation

Beyond Detection: Towards Multi-Object Tracking and Segmentation Andreas Geiger Autonomous Vision Group University of T ubingen / MPI for Intelligent Systems June 17, 2018 University of Tbingen MPI for Intelligent Systems Autonomous

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

People-Tracking-by-Detection and People-Detection-by-Tracking Mykhaylo Andriluka Stefan Roth

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Foreground detection and tracking in 2D/3D Jos Luis Landabaso Montse Pards Outline 2D

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Tracking H akan Ard o February 22, 2012 H akan Ard o Tracking February 22, 2012 1

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Object Detection and Tracking in 3D World Xinshuo Weng 3D Object Detection Goal Goal Inputs:

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

From image classification to object detection Image classification Object detection Image source

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

Energy for Business Innovate & Collaborate with the University of Nottingham Dr. Laura

= = = f f BOB BOB not does like not like = Alice Bob Alice Bob not Bob Coecke

bit.ly/uwctech Sen McHugh | Transformational Technology bit.ly/ SAMMS smc@uwcsea.edu.sg |

Computer Vision Lecture 5: Edges, binary images and blobs Last lecture Convolution masks as

Emer Emergenc ence o e of O f Obj bjec ect S Seg egmen mentatio ion n in P in Per

Masking TablesAn Underestimated Security Risk Michael Tunstall Carolyn Whitnall Elisabeth

Masking against Side-Channel Attacks: a Formal Security Proof Matthieu Rivain Joint work with

Beyond Detection: Towards Multi-Object Tracking and Segmentation - PowerPoint PPT Presentation

Beyond Detection: Towards Multi-Object Tracking and Segmentation Andreas Geiger Autonomous Vision Group University of T ubingen / MPI for Intelligent Systems June 17, 2018 University of Tbingen MPI for Intelligent Systems Autonomous

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory &amp; Implementation

People-Tracking-by-Detection and People-Detection-by-Tracking Mykhaylo Andriluka Stefan Roth

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Foreground detection and tracking in 2D/3D Jos Luis Landabaso Montse Pards Outline 2D

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Tracking H akan Ard o February 22, 2012 H akan Ard o Tracking February 22, 2012 1

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Object Detection and Tracking in 3D World Xinshuo Weng 3D Object Detection Goal Goal Inputs:

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

From image classification to object detection Image classification Object detection Image source

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

Energy for Business Innovate &amp; Collaborate with the University of Nottingham Dr. Laura

= = = f f BOB BOB not does like not like = Alice Bob Alice Bob not Bob Coecke

bit.ly/uwctech Sen McHugh | Transformational Technology bit.ly/ SAMMS smc@uwcsea.edu.sg |

Computer Vision Lecture 5: Edges, binary images and blobs Last lecture Convolution masks as

Emer Emergenc ence o e of O f Obj bjec ect S Seg egmen mentatio ion n in P in Per

Masking TablesAn Underestimated Security Risk Michael Tunstall Carolyn Whitnall Elisabeth

Masking against Side-Channel Attacks: a Formal Security Proof Matthieu Rivain Joint work with

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

Energy for Business Innovate & Collaborate with the University of Nottingham Dr. Laura