Object Detection and Tracking in 3D World Xinshuo Weng
3D Object Detection
Goal
Goal Inputs: ● LiDAR point cloud ○
Goal Inputs: ● LiDAR point cloud ○ Monocular Images ○
Goal Inputs: ● LiDAR point cloud ○ Monocular Images ○ Stereo images ○ Left Right
Goal Inputs: ● LiDAR point cloud ○ Monocular Images ○ Stereo images ○ Or fusion ○
Goal Inputs: ● LiDAR point cloud ○ Monocular Images ○ Stereo images ○ Or fusion ○ Outputs: ● Eight corners ○ Four corners + height ○ Size (l,w,h) + center (x,y,z) + heading ( 𝜾 ) ○
3D Object Detection from LiDAR Point Cloud Shi et al, “PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud”, CVPR, 2019.
3D Object Detection from Monocular Images Goal: estimate 7 DoF parameters ● Leverage the 2D-3D bounding box consistency constraint ● Provide 4 constraints ○ Mousavian et al, “3D Bounding Box Estimation Using Deep Learning and Geometry”, CVPR, 2017.
3D Object Detection from Monocular Images Goal: estimate 7 DoF parameters ● Leverage the 2D-3D bounding box consistency constraint ● Provide 4 constraints ○ Need at least another three ○ Mousavian et al, “3D Bounding Box Estimation Using Deep Learning and Geometry”, CVPR, 2017.
3D Object Detection from Stereo Images Li et al, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, CVPR, 2019.
3D Object Detection from Stereo Images 2D bounding box (x, y, z, 𝜾 ) Size (l, w, h) Li et al, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, CVPR, 2019.
3D Object Detection from Stereo Images Matching loss Li et al, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, CVPR, 2019.
3D Object Detection from Images and LiDAR Qi et al, “Frustum PointNets for 3D Object Detection from RGB-D Data”, CVPR, 2018.
Our Recent Work on Monocular 3D Object Detection ● Accepted to autonomous driving workshop in ICCV 2019 ● Motivation: to bridge the performance gap between LiDAR and camera for 3D object detection ● KITTI dataset leaderboard: LiDAR-based 3D detection Monocular 3D detection X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
Our Recent Work on Monocular 3D Object Detection X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
Our Recent Work on Monocular 3D Object Detection Contributions: ● Pseudo-LiDAR framework ○ Two observations: ○ Long tail ■ Local misalignment ■ X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
Our Recent Work on Monocular 3D Object Detection Contributions: ● Pseudo-LiDAR framework ○ Two observations: ○ Long tail – instance mask proposal ■ Local misalignment ■ X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
Our Recent Work on Monocular 3D Object Detection Contributions: ● Pseudo-LiDAR framework ○ Two observations: ○ Long tail – instance mask proposal ■ Local misalignment – bounding box consistency loss (BBCL) and optimization (BBCO) ■ X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
Our Recent Work on Monocular 3D Object Detection Inputs are monocular images only ● Current 1 st position on both KITTI 3D detection / bird’s eye view detection leaderboard among ● monocular methods X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
Our Recent Work on Monocular 3D Object Detection [6] R. Urtasun et al (University of Toronto). Monocular 3D Object Detection for Autonomous Driving. CVPR 2016. [30] J. Kosecka (George Mason Unibrtsity). 3D Bounding Box Estimation Using Deep Learning and Geometry. CVPR 2017. [58] Z. Chen (Wuhan University) et al. Multi-Level Fusion based 3D Object Detection from Monocular Images. CVPR 2018. X. Weng and K. Kitani, “Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud”, ICCVW, 2019.
3D Multi-Object Tracking
Goal
Goal Inputs: ● LiDAR point cloud ○ Monocular Image ○ Stereo image, add video ○ Or fusion ○ Outputs: ● Eight corners ○ Four corners + height ○ Size + center + orientation ○ identity ○
Goal Inputs: ● LiDAR point cloud ○ Monocular Image ○ Stereo image, add video ○ Or fusion ○ Outputs: ● Eight corners ○ Four corners + height ○ Size + center + orientation ○ Identity – association problem ○
Typical Multi-Object Tracking Solver Tracking-by-detection pipeline ●
Typical Multi-Object Tracking Solver Tracking-by-detection pipeline ● detector ●
Typical Multi-Object Tracking Solver Tracking-by-detection pipeline ● detector + appearance model + motion model ●
Typical Multi-Object Tracking Solver Tracking-by-detection pipeline ● detector + appearance model + motion model + data association (e.g., Hungarian algorithm) ●
Typical Multi-Object Tracking (MOT) Solver Tracking-by-detection pipeline ● detector + appearance model + motion model + data association ● Deep Deep motion Deep association appearance network network network
3D MOT from LiDAR Point Cloud Luo et al, “Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net”, CVPR, 2018.
3D MOT from LiDAR Point Cloud SimNet AssocNet Baser et al, “FANTrack: 3D Multi-Object Tracking with Feature Association Network”, arXiv, 2019.
3D MOT from LiDAR Point Cloud Frossard et al, “End-to-end Learning of Multi-sensor 3D Tracking by Detection”, ICRA, 2018.
Our Recent Work on 3D Multi-Object Tracking Tracking by detection ● Detection: state-of-the-art 3D object detector ---- PointRCNN ○ Tracking: Kalman filter with 3D constant velocity model + Hungarian algorithm, no appearance model ○ X. Weng and K. Kitani, “Simple Baseline and New Evaluation Tool for 3D Multi-Object Tracking”, arXiv, 2019.
Our Recent Work on 3D Multi-Object Tracking Inputs are only LiDAR point cloud only ● Current 1 st position on KITTI 3D tracking leaderboard, 2 nd position on KITTI 2D tracking leaderboard among ● published works X. Weng and K. Kitani, “Simple Baseline and New Evaluation Tool for 3D Multi-Object Tracking”, arXiv, 2019.
Our Recent Work on 3D Multi-Object Tracking 2D tracking results on KITTI test set 3D tracking results on KITTI validation set [1] Raquel Urtasun. End-to-End Learning of Multi-Sensor 3D Tracking by Detection. ICRA 2018. [2] Krzysztof Czarnecki. University of Waterloo. FANTrack: 3D Multi-Object Tracking with Feature Association Network. arXiv 2019. [3] Karl Granstrom, Chalmer University of Technology. Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering. ITSC 2018. [5] K. Madhava Krishna. IIIT Hyderabad, India. Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking. ICRA 2018. X. Weng and K. Kitani, “Simple Baseline and New Evaluation Tool for 3D Multi-Object Tracking”, arXiv 2019.
Takeaway Message With proper use, conceptually simple idea can achieve an unprecedented improvement of ● performance in practice
Recommend
More recommend