mots multi object tracking and segmentation
play

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender - PowerPoint PPT Presentation

Visual Computing Institute Computer Vision MOTS: Multi-Object Tracking and Segmentation MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint work with M. Krause, A. O sep, J. Luiten, B. B. G. Sekar,


  1. Visual Computing Institute Computer Vision MOTS: Multi-Object Tracking and Segmentation MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint work with M. Krause, A. O˘ sep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe CVPR 2019 main conference poster #122, Wednesday 15:20

  2. <latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> <latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> Motivation MOTS: Multi-Object Tracking and Segmentation t t video sequence video sequence with bounding boxes with pixel-level segmentation masks ◮ Now many datasets for multi-object tracking available ◮ MOTChallenges ◮ MOT15 [Leal-Taix´ e et al., 2015] ◮ MOT16, MOT17 [Milan et al., 2016] ◮ CVPR19 [Dendorfer et al., 2019] ◮ KITTI Tracking [Geiger et al., 2012] ◮ VisDrone2018 [Zhu et al., 2018] ◮ DukeMTMC [Ristani et al., 2016] ◮ UA-DETRAC [Wen et al., 2015] ◮ But annotations are only on the bounding box level Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 1

  3. Motivation MOTS: Multi-Object Tracking and Segmentation ◮ In difficult cases, bounding boxes are a very coarse approximation ◮ Most of the pixels of the bounding box belong to other objects Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 2

  4. So let there be Annotations MOTS: Multi-Object Tracking and Segmentation ◮ Dense pixel-wise annotations are super expensive... ◮ But we did it! Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 3

  5. So let there be Annotations MOTS: Multi-Object Tracking and Segmentation ◮ Dense pixel-wise annotations are super expensive... ◮ But we did it! ◮ How? ◮ Semi-automatic annotation procedure KITTI MOTS MOTSChallenge train val # Sequences 12 9 4 # Frames 5,027 2,981 2,862 # Tracks Pedestrian 99 68 228 # Masks Pedestrian Total 8,073 3,347 26,894 Manually annotated 1,312 647 3,930 # Tracks Car 431 151 - # Masks Car Total 18,831 8,068 - Manually annotated 1,509 593 - Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 4

  6. Outline MOTS: Multi-Object Tracking and Segmentation ◮ Semi-automatic Annotation Procedure ◮ Evaluation Measures ◮ TrackR-CNN Baseline Method ◮ Results Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 5

  7. Semi-automatic Annotation Procedure MOTS: Multi-Object Tracking and Segmentation ◮ Starting point: existing box level tracking annotations ◮ Fully convolutional network (Box2Seg) converts bounding boxes to segmentation masks Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 6

  8. <latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> <latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> Semi-automatic Annotation Procedure MOTS: Multi-Object Tracking and Segmentation ◮ Starting point: dataset with existing box level tracking annotations Pick erroneous Quality standards Global Annotators manually Quality masks reached Start Box2Seg annotate additional Assurance End training polygons For Each Frame Track Box2Seg Box2Seg (train) (eval) Fine-tuning Segment t t on polygons bounding boxes Track with bounding boxes Pixel-Level Object Masks and some polygon annotations Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 7

  9. Semi-automatic Annotation Procedure MOTS: Multi-Object Tracking and Segmentation ◮ Manual corrections ensure consistent and high quality ◮ Large savings in time ◮ KITTI MOTS: only 13% of car boxes / 17% of pedestrian boxes manually annotated ◮ MOTSChallenge: 15% of pedestrian boxes manually annotated Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 8

  10. Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ We consider mask-based variants of the CLEAR MOT metrics [Bernardin and Stiefelhagen, 2008] ◮ Need to establish correspondences between hypothesized and ground truth objects ◮ Box-based tracking: non-trivial due to allowed overlap ◮ Hungarian matching needed ◮ Mask-based: we require disjoint masks! ◮ Correspondences are unique and straightforward ◮ Hypothesized and ground truth masks are matched iff mask IoU > 0 . 5 Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 9

  11. Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ MOTSA: Multi-Object Tracking and Segmentation Accuracy MOTSA = 1 − | FN | + | FP | + | IDS | = | TP | − | FP | − | IDS | | M | | M | ◮ Like MOTA, but with mask-based IoU instead of box IoU ◮ TP: true positives ◮ FN: false negatives ◮ FP: false positives ◮ IDS: ID switches ◮ M: set of ground truth segmentation masks Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 10

  12. Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ � TP: soft number of true positives � � TP = IoU( h , c ( h )) h ∈ TP ◮ c : unique mapping from hypotheses to ground truth Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 11

  13. Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ � TP: soft number of true positives � � TP = IoU( h , c ( h )) h ∈ TP ◮ MOTSP: Multi-Object Tracking and Segmentation Precision � TP MOTSP = | TP | ◮ c : unique mapping from hypotheses to ground truth Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 11

  14. Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ sMOTSA: Soft Multi-Object Tracking and Segmentation Accuracy � TP − | FP | − | IDS | sMOTSA = | M | ◮ Combines tracking and segmentation quality into a single measure Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 12

  15. Baseline Method: TrackR-CNN MOTS: Multi-Object Tracking and Segmentation ◮ Idea: detection, segmentation, and data association with single convolutional network ◮ Extend Mask R-CNN by 3D convolutions and association head ◮ ResNet-101 backbone, Mask R-CNN pre-trained on Mapillary ◮ Speed: ∼ 2 fps . . During . Image Training Features Image Instance Loss Segmentation ... Bounding Box Feature t-1 Regression Ground Truth Extraction Temporally Enhanced Shared Image weights Classification Features CAR: 0.99 Loss CAR: 0.99 Video Tracking CAR: 0.99 + Scoring CAR: 0.99 CAR: 0.99 Ground Truth Region Feature t 2x Proposal Extraction Network During 3D Conv Online Track Evaluation Shared Association Mask weights Generation Previously ... t+1 Feature Tracked Extraction Objects Association Embedding . . . 128-D Association Vectors Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 13

  16. TrackR-CNN MOTS: Multi-Object Tracking and Segmentation ... Image Features Feature ... t-1 Extraction Temporally Enhanced Shared Image weights Features Region Feature t 2x Proposal Extraction 3D Conv Network Shared weights ... Feature t+1 Extraction ... Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 14

  17. TrackR-CNN MOTS: Multi-Object Tracking and Segmentation During Training Image Instance Loss Segmentation Bounding Box Ground Truth Regression Classification Loss CAR: 0.99 Video Tracking CAR: 0.99 CAR: 0.99 + Scoring CAR: 0.99 CAR: 0.99 Ground Truth Region Proposal During Network Evaluation Online Track Association Mask Generation Previously Tracked Objects Association Embedding 128-D Association Vectors Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 15

Recommend


More recommend