3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu 1
3D multi-object tracking is an important perception task for autonomous driving 2
Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Evaluation 3
Standard 3D MOT Pipeline Sensor Data 3D Object Detection LiDAR RGB Data Association Evaluation 4
Standard 3D MOT Pipeline Sensor Data 3D Object Detection Detection results Data Association Evaluation 5
Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Tracking results Evaluation 6
Standard 3D MOT Pipeline Sensor Data 3D Object Detection Evaluation: Data Association 1. MOTA: MOT accuracy 2. MOTP: MOT precision 3. IDS: # of identity switches 4. FRAG: # of trajectory fragments 5. …… Also important! Evaluation 7
Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Evaluation 8
What is the state of the art? 9
State of the Art (3D MOT) Sensor Data Better models from better (bigger) data! 3D Object * Detection 150x increase! Data Association Evaluation * Mined trajectory data not counted for the Argo dataset 10
State of the Art (3D MOT) Sensor Data Monocular 3D Detection (KITTI) 3D Object Detection 15x increase (3 years) AP Data Association Evaluation 11 Image credit to Patrick Langechuan Liu, https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e
State of the Art (3D MOT) Sensor Data Lidar-based 3D Detection (KITTI) 3D Object Detection 27% increase (2 years) Data Association Evaluation 12
State of the Art (3D MOT) Sensor Data 2D MOT (KITTI) 3D Object Detection 18% increase (5 years) Data Association Evaluation *3D methods compared using 2D evaluation on KITTI 13
State of the Art (3D MOT) Sensor Data 3D Object Detection Recent trend: Jointly optimized Feature Extraction D. Frossard R. Urtasun. End-to-End Learning of Multi-Sensor 3D Data Association Tracking by Detection. ICRA 2018. Optimization Zhang et al. Robust Multi-Modality Multi-Object Tracking. ICCV 2019. Evaluation 14
State of the Art (3D MOT) Sensor Data 3D Object Detection What are open problems in Feature Extraction 3D MOT? Optimization Evaluation 15
Some Open Problems (3D MOT) Sensor Data Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but 3D Object doesn't take into account sensor physics Detection Should also take into account sensor optimization and redundancy Detection and tracking should be coupled Feature Extraction more tightly Representation doesn't take into account This talk context of other objects and the scene Optimization Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Evaluation Weak 3D MOT evaluation datasets and This talk metrics 16
Recent Work on Evaluation 17
What are the Issues of 3D MOT Evaluation? • Matching criteria: IoU (intersection of union) • For the pioneering 3D MOT dataset KITTI, evaluation is performed in the 2D space • IoU is computed on the 2D image plane (not 3D) • The common practice for evaluating 3D MOT methods is: • Project 3D trajectories onto the image plane • Run the 2D evaluation code provided by KITTI B p : the predicted box B g : the ground truth box B c : the smallest enclosing box I 2D , I 3D : the intersection IoU in 2D space IoU in 3D space 18 Image credit to Xu et al: 3D-GIoU
What are the Issues of 3D MOT Evaluation? • Why is it not good to evaluate 3D MOT methods in the 2D space? • Cannot measure the strength of 3D MOT methods • Estimated 3D information: depth value, object dimensionality (length, height and width), heading orientation • Cannot fairly compare 3D MOT methods, why? • Not penalized by the wrong predicted depth value, length, heading as long as the 2D projection is accurate • Which predicted box is better, blue or green? • Conclusion: should not evaluate 3D MOT methods in the 2D space Blue: the predicted box 1 Green: the predicted box 2 C Red: the ground truth box 19 X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
Our Solution: Upgrade the Matching Criteria to 3D • Replace the matching criteria (2D IoU) in the KITTI evaluation code with 3D IoU • https://github.com/xinshuoweng/AB3DMOT (9 00 stars ) • Work with nuTonomy collaborators and use our 3D MOT evaluation metrics in the nuScenes evaluation with the matching criteria of center distance nuScenes 3D MOT evaluation with our metrics Our released new evaluation code 20 X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
What are the Issues of Evaluation? • Are we done with the evaluation? Can we further improve the current metrics? • E.g., MOTA (multi-object tracking accuracy) • 𝑁𝑃𝑈𝐵 = 1 − !" #!$#%&' ()* !" • Performance is measured at a single recall point MOTA over Recall curve 21 X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
What are the Issues of Evaluation? • Why is it not good to evaluate at a single recall point? • Consequences • The confidence threshold needs to be carefully tuned, requiring non-trivial effort • Sensitive to different detectors, different dataset, different object categories • Cannot understand the full spectrum of accuracy of a MOT system • Which MOT system is better, blue or orange? • The orange one has higher MOTA at its best recall point (r = 0.9) • The blue one has overall higher MOTA at many recall points MOTA over Recall curve • Ideally, we want as high performance as possible at all recall points 3D MOT system 1 3D MOT system 2 1 0.9 0.8 0.7 MOTA 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 22 Recall
Our Solution: Integral Metrics • MOTA is measured at a single point on the curve Area under the curve • What can we do to improve the evaluation metrics? • Compute the integral metrics through the area under the curve, e.g., average MOTA (AMOTA) • Analogous to the average precision (AP) in object detection • Can measure the full spectrum of MOT accuracy MOTA over Recall curve 23 X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
Recent Work on Improve Feature Learning for 3D MOT 24
What are the Issues of Feature Learning? Sensor Data 3D Object Detector • Goal: learn discriminative features for different objects • Issues in the feature learning? Feature Extractor • Feature extraction for each object is independent of other objects • Why not good? No communication between objects, ignoring the context information Optimizer • Employ feature from one or two modalities • E.g., 2D appearance, or 2D motion, or 3D motion, or 3D appearance Evaluation • Why not good? Not utilize all the information that is complementary 2D (or 3D) feature extractor Objects in frame t Affinity matrix Pipeline from Hungarian algorithm Prior work 2D (or 3D) feature extractor Objects in frame t+1 frame t frame t+1 25 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
Improve Feature Learning for 3D MOT Sensor Data 3D Object Detector • How can we address these two issues? • Shouldn’t features depend on the context of other objects? Feature Extractor • Propose a novel feature interaction mechanism Optimizer • How can we utilize the information from all the modalities? • Extract multi-modal features that are complimentary to each other Evaluation • i.e., 2D motion + 2D appearance + 3D motion + 3D appearance 2D + 3D feature extractor Objects in frame t Affinity matrix Feature Pipeline from interaction Our work Hungarian algorithm 2D + 3D feature extractor Objects in frame t+1 Iteratively frame t frame t+1 26 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
Improve Feature Learning for 3D MOT Sensor Data 3D Object Detector • How do we do? • (a) Obtain the appearance / motion features from the 3D point cloud Feature Extractor • LSTM for 3D motion from 3D box trajectories • PointNet for 3D appearance from point cloud Optimizer Evaluation 27 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
Improve Feature Learning for 3D MOT Sensor Data 3D Object Detector • How do we do? • (b) Obtain the appearance / motion features from the 2D image Feature Extractor • LSTM for 2D motion from 2D box trajectories • CNN for 2D appearance from 2D image patches Optimizer Evaluation 28 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
Recommend
More recommend