beyond object recognition in 2d
play

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition - PowerPoint PPT Presentation

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018 Motion is Important for Recognition Johansson, Biological Motion


  1. Beyond Object Recognition in 2D Georgia Gkioxari

  2. Object Recognition in 2D

  3. The World is 3D Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018

  4. Motion is Important for Recognition Johansson, Biological Motion Perception

  5. Appearance (x, y) Motion Shape (x, y, t) (x, y, z)

  6. Appearance (x, y) Motion Shape (x, y, t) (x, y, z)

  7. 2D: Mask R-CNN He et al., Mask R-CNN, ICCV 2017

  8. 2D: Mask R-CNN • Object Localization • Instance Segmentation • Pose Estimation from a Single Image He et al., Mask R-CNN, ICCV 2017

  9. 2D + t: Object & Pose Tracking Challenges • Multiple Objects • Occlusions • Variations in Poses

  10. 2D + t: 3D Mask R-CNN Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  11. 2D + t: 3D Mask R-CNN 3D inflated CNN Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  12. 2D + t: 3D Mask R-CNN Predicts 3D tubes instead of 2D rois Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  13. 2D + t: 3D Mask R-CNN RoiAlign in (x, y, t) Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  14. 2D + t: 3D Mask R-CNN Tube object classification Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  15. 2D + t: 3D Mask R-CNN Pose estimation for each tube for each time step Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  16. 2D + t: 3D Mask R-CNN Girdhar et al., Detect-And-Track: Efficient Pose Estimation in Videos, CVPR 2018

  17. The Challenges When Learning from Video • 3D CNNs are time and memory consuming • Small batch sizes • Prone to overfitting • Redundant Computations • Consecutive frames look similar • 3D convolutions allocate the same amount of computation across time and pixels • 3D extensions of Image-based CNNs might be suboptimal

  18. Slow-Fast Networks for Video Recognition Feichtenhofer et al., arXiv 2018

  19. Slow-Fast Networks for Video Recognition Slow pathway Slow T C T C T C prediction H,W T C Fast αT αT βC βC αT Fast pathway βC Feichtenhofer et al., arXiv 2018

  20. Slow-Fast Networks for Video Recognition Slow pathway T C T C T C H,W concat T C αT αT βC βC αT Fast pathway βC Feichtenhofer et al., arXiv 2018

  21. Slow-Fast Networks for Video Recognition • Kinetics 400

  22. Slow-Fast Networks for Video Recognition • AVA

  23. Can Motion Also Help 2D? • Motion is important for video understanding • Object Tracking • Action Recognition • Can motion help single image understanding? • Humans learn to recognize using motion cues • Can motion help us recognize better or with less data?

  24. DensePose input image DensePose surface of 3D model Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

  25. DensePose: Annotations keypoints full annotations limited dense annotations sparse annotations Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

  26. DensePose: Performance wrt #Annotations Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

  27. DensePose: Annotation Propagation with Optical Flow Transfer a given label to a new frame Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

  28. DensePose: Annotation Propagation with Optical Flow 2 Gains in performance 1.5 1 0.5 0 ground truth propagation equivariance all Neverova, Slim DensePoseL Thrifty Learning from Sparse Annotations and Motion Cues, CVPR 2019

  29. Appearance (x, y) Motion Shape (x, y, t) (x, y, z)

  30. Appearance (x, y) Motion Shape (x, y, t) (x, y, z)

  31. Mesh R-CNN: Objects and Shapes Gkioxari et al., Mesh R-CNN, ArXiv 2019

  32. Mesh R-CNN: Objects and Shapes Gkioxari et al., Mesh R-CNN, ArXiv 2019

  33. Mesh R-CNN: Objects and Shapes sofa chair Gkioxari et al., Mesh R-CNN, ArXiv 2019

  34. Mesh R-CNN: Objects and Shapes sofa chair Gkioxari et al., Mesh R-CNN, ArXiv 2019

  35. Mesh R-CNN: Objects and Shapes sofa chair Gkioxari et al., Mesh R-CNN, ArXiv 2019

  36. Mesh R-CNN: Objects and Shapes

  37. Appearance (x, y) Motion Shape (x, y, t) (x, y, z)

  38. Appearance (x, y) Motion Shape (x, y, t) (x, y, z)

  39. Whelan et al., Reconstructing Scenes with Mirror and Glass Surfaces, SIGGRAPH 2018

  40. Thank you

Recommend


More recommend