Beyond Object Recognition in 2D Georgia Gkioxari
Object Recognition in 2D
The World is 3D Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018
Motion is Important for Recognition Johansson, Biological Motion Perception
Appearance (x, y) Motion Shape (x, y, t) (x, y, z)
Appearance (x, y) Motion Shape (x, y, t) (x, y, z)
2D: Mask R-CNN He et al., Mask R-CNN, ICCV 2017
2D: Mask R-CNN • Object Localization • Instance Segmentation • Pose Estimation from a Single Image He et al., Mask R-CNN, ICCV 2017
2D + t: Object & Pose Tracking Challenges • Multiple Objects • Occlusions • Variations in Poses
2D + t: 3D Mask R-CNN Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN 3D inflated CNN Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN Predicts 3D tubes instead of 2D rois Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN RoiAlign in (x, y, t) Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN Tube object classification Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN Pose estimation for each tube for each time step Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
2D + t: 3D Mask R-CNN Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018
The Challenges When Learning from Video • 3D CNNs are time and memory consuming • Small batch sizes • Prone to overfitting • Redundant Computations • Consecutive frames look similar • 3D convolutions allocate the same amount of computation across time and pixels • 3D extensions of Image-based CNNs might be suboptimal
Slow-Fast Networks for Video Recognition Feichtenhofer et al., arXiv 2018
Slow-Fast Networks for Video Recognition Slow pathway Slow T C T C T C prediction H,W T C Fast αT αT βC βC αT Fast pathway βC Feichtenhofer et al., arXiv 2018
Slow-Fast Networks for Video Recognition Slow pathway T C T C T C H,W concat T C αT αT βC βC αT Fast pathway βC Feichtenhofer et al., arXiv 2018
Slow-Fast Networks for Video Recognition • Kinetics 400
Slow-Fast Networks for Video Recognition • AVA
Can Motion Also Help 2D? • Motion is important for video understanding • Object Tracking • Action Recognition • Can motion help single image understanding? • Humans learn to recognize using motion cues • Can motion help us recognize better or with less data?
DensePose input image DensePose surface of 3D model Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Annotations keypoints full annotations limited dense annotations sparse annotations Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Performance wrt #Annotations Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Annotation Propagation with Optical Flow Transfer a given label to a new frame Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
DensePose: Annotation Propagation with Optical Flow 2 Gains in performance 1.5 1 0.5 0 ground truth propagation equivariance all Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019
Appearance (x, y) Motion Shape (x, y, t) (x, y, z)
Appearance (x, y) Motion Shape (x, y, t) (x, y, z)
Mesh R-CNN: Objects and Shapes Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes sofa chair Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes sofa chair Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes sofa chair Gkioxari et al., Mesh R-CNN, ArXiv 2019
Mesh R-CNN: Objects and Shapes
Appearance (x, y) Motion Shape (x, y, t) (x, y, z)
Appearance (x, y) Motion Shape (x, y, t) (x, y, z)
Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018
Thank you
Recommend
More recommend